Thomas Colthurst

Thomas Colthurst

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning
    Aaron Bell
    Aviad Barzilai
    Roy Lee
    Gia Jung
    Charles Elliott
    Adam Boulanger
    Amr Helmy
    Jacob Bien
    Ruth Alcantara
    Nadav Sherman
    Hassler Thurston
    Yotam Gigi
    Bolous Jaber
    Vered Silverman
    Luke Barrington
    Tim Thelin
    Elad Aharoni
    Kartik Hegde
    Yuval Carny
    Shravya Shetty
    Yehonathan Refael
    Stone Jiang
    David Schottlander
    Juliet Rothenberg
    Luc Houriez
    Yochai Blau
    Joydeep Paul
    Yang Chen
    Yael Maguire
    Aviv Slobodkin
    Shlomi Pasternak
    Alex Ottenwess
    Jamie McPike
    Per Bjornsson
    Natalie Williams
    Reuven Sayag
    Thomas Turnbull
    Ali Ahmadalipour
    David Andre
    Amit Aides
    Ean Phing VanLee
    Niv Efron
    Monica Bharel
    arXiv (preprint 2025), arXiv, arXiv:2510.18318 https://doi.org/10.48550/arXiv.2510.18318 (2025)
    Preview abstract Geospatial data offers immense potential for understanding our planet. However, the sheer volume and diversity of this data along with its varied resolutions, timescales, and sparsity pose significant challenges for thorough analysis and interpretation. The emergence of Foundation Models (FMs) and Large Language Models (LLMs) offers an unprecedented opportunity to tackle some of this complexity, unlocking novel and profound insights into our planet. This paper introduces a comprehensive approach to developing Earth AI solutions, built upon foundation models across three key domains—Planet-scale Imagery, Population, and Environment—and an intelligent Gemini-powered reasoning engine. We present rigorous benchmarks showcasing the power and novel capabilities of our foundation models and validate that they provide complementary value to improve geospatial inference. We show that the synergy between these models unlocks superior predictive capabilities. To handle complex, multi-step queries, we developed a Gemini-powered agent that jointly reasons over our multiple foundation models along with large geospatial data sources and tools to unlock novel geospatial insights. On a new benchmark of real-world crisis scenarios, our agent demonstrates the ability to deliver critical and timely insights, effectively bridging the gap between raw geospatial data and actionable understanding. View details
    Preview abstract Genome-wide association studies (GWASs) examine the association between genotype and phenotype while adjusting for a set of covariates. Although the covariates may have non-linear or interactive effects, due to the challenge of specifying the model, GWAS often neglect such terms. Here we introduce DeepNull, a method that identifies and adjusts for non-linear and interactive covariate effects using a deep neural network. In analyses of simulated and real data, we demonstrate that DeepNull maintains tight control of the type I error while increasing statistical power by up to 20% in the presence of non-linear and interactive effects. Moreover, in the absence of such effects, DeepNull incurs no loss of power. When applied to 10 phenotypes from the UK Biobank (n = 370K), DeepNull discovered more hits (+6%) and loci (+7%), on average, than conventional association analyses, many of which are biologically plausible or have previously been reported. Finally, DeepNull improves upon linear modeling for phenotypic prediction (+23% on average). View details
    Preview abstract Genome-wide association studies (GWAS) examine the association between genotype and phenotype while adjusting for a set of covariates. Although the covariates may have non-linear or interactive effects, due to the challenge of specifying the model, GWAS often neglect such terms. Here we introduce DeepNull, a method that identifies and adjusts for non-linear and interactive covariate effects using a deep neural network. In analyses of simulated and real data, we demonstrate that DeepNull maintains tight control of the type I error while increasing statistical power by up to 20% in the presence of non-linear and interactive effects. Moreover, in the absence of such effects, DeepNull incurs no loss of power. When applied to 10 phenotypes from the UK Biobank (n=370K), DeepNull discovered more hits (+6%) and loci (+7%), on average, than conventional association analyses, many of which are biologically plausible or have previously been reported. Finally, DeepNull improves upon linear modeling for phenotypic prediction (+23% on average). View details
    A universal SNP and small-indel variant caller using deep neural networks
    Scott Schwartz
    Dan Newburger
    Jojo Dijamco
    Nam Nguyen
    Pegah T. Afshar
    Sam S. Gross
    Lizzie Dorfman
    Mark A. DePristo
    Nature Biotechnology (2018)
    Preview abstract Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. Here we show that a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. The approach, called DeepVariant, outperforms existing state-of-the-art tools. The learned model generalizes across genome builds and mammalian species, allowing nonhuman sequencing projects to benefit from the wealth of human ground-truth data. We further show that DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes, highlighting the benefits of using more automated and generalizable techniques for variant calling. View details