Health & Bioscience

Research in health and biomedical sciences has a unique potential to improve peoples’ lives, and includes work ranging from basic science that aims to understand biology, to diagnosing individuals’ diseases, to epidemiological studies of whole populations. We recognize that our strengths in machine learning, large-scale computing, and human-computer interaction can help accelerate the progress of research in this space. By collaborating with world-class institutions and researchers and engaging in both early-stage research and late-stage work, we hope to help people live healthier, longer, and more productive lives.

Recent Publications

Preview abstract Importance: Interest in artificial intelligence (AI) has reached an all-time high, and health care leaders across the ecosystem are faced with questions about where, when, and how to deploy AI and how to understand its risks, problems, and possibilities. Observations: While AI as a concept has existed since the 1950s, all AI is not the same. Capabilities and risks of various kinds of AI differ markedly, and on examination 3 epochs of AI emerge. AI 1.0 includes symbolic AI, which attempts to encode human knowledge into computational rules, as well as probabilistic models. The era of AI 2.0 began with deep learning, in which models learn from examples labeled with ground truth. This era brought about many advances both in people’s daily lives and in health care. Deep learning models are task-specific, meaning they do one thing at a time, and they primarily focus on classification and prediction. AI 3.0 is the era of foundation models and generative AI. Models in AI 3.0 have fundamentally new (and potentially transformative) capabilities, as well as new kinds of risks, such as hallucinations. These models can do many different kinds of tasks without being retrained on a new dataset. For example, a simple text instruction will change the model’s behavior. Prompts such as “Write this note for a specialist consultant” and “Write this note for the patient’s mother” will produce markedly different content. Conclusions and Relevance: Foundation models and generative AI represent a major revolution in AI’s capabilities, ffering tremendous potential to improve care. Health care leaders are making decisions about AI today. While any heuristic omits details and loses nuance, the framework of AI 1.0, 2.0, and 3.0 may be helpful to decision-makers because each epoch has fundamentally different capabilities and risks. View details
Validation of a deep learning system for the detection of diabetic retinopathy in Indigenous Australians
Mark Chia
Fred Hersch
Pearse Keane
Angus Turner
British Journal of Ophthalmology, 108 (2024), pp. 268-273
Preview abstract Background/aims: Deep learning systems (DLSs) for diabetic retinopathy (DR) detection show promising results but can underperform in racial and ethnic minority groups, therefore external validation within these populations is critical for health equity. This study evaluates the performance of a DLS for DR detection among Indigenous Australians, an understudied ethnic group who suffer disproportionately from DR-related blindness. Methods: We performed a retrospective external validation study comparing the performance of a DLS against a retinal specialist for the detection of more-than-mild DR (mtmDR), vision-threatening DR (vtDR) and all-cause referable DR. The validation set consisted of 1682 consecutive, single-field, macula-centred retinal photographs from 864 patients with diabetes (mean age 54.9 years, 52.4% women) at an Indigenous primary care service in Perth, Australia. Three-person adjudication by a panel of specialists served as the reference standard. Results: For mtmDR detection, sensitivity of the DLS was superior to the retina specialist (98.0% (95% CI, 96.5 to 99.4) vs 87.1% (95% CI, 83.6 to 90.6), McNemar’s test p<0.001) with a small reduction in specificity (95.1% (95% CI, 93.6 to 96.4) vs 97.0% (95% CI, 95.9 to 98.0), p=0.006). For vtDR, the DLS’s sensitivity was again superior to the human grader (96.2% (95% CI, 93.4 to 98.6) vs 84.4% (95% CI, 79.7 to 89.2), p<0.001) with a slight drop in specificity (95.8% (95% CI, 94.6 to 96.9) vs 97.8% (95% CI, 96.9 to 98.6), p=0.002). For all-cause referable DR, there was a substantial increase in sensitivity (93.7% (95% CI, 91.8 to 95.5) vs 74.4% (95% CI, 71.1 to 77.5), p<0.001) and a smaller reduction in specificity (91.7% (95% CI, 90.0 to 93.3) vs 96.3% (95% CI, 95.2 to 97.4), p<0.001). Conclusion: The DLS showed improved sensitivity and similar specificity compared with a retina specialist for DR detection. This demonstrates its potential to support DR screening among Indigenous Australians, an underserved population with a high burden of diabetic eye disease. View details
Artificial Intelligence in Healthcare: A Perspective from Google
Lily Peng
Lisa Lehmann
Artificial Intelligence in Healthcare, Elsevier (2024)
Preview abstract Artificial Intelligence (AI) holds the promise of transforming healthcare by improving patient outcomes, increasing accessibility and efficiency, and decreasing the cost of care. Realizing this vision of a healthier world for everyone everywhere requires partnerships and trust between healthcare systems, clinicians, payers, technology companies, pharmaceutical companies, and governments to drive innovations in machine learning and artificial intelligence to patients. Google is one example of a technology company that is partnering with healthcare systems, clinicians, and researchers to develop technology solutions that will directly improve the lives of patients. In this chapter we share landmark trials of the use of AI in healthcare. We also describe the application of our novel system of organizing information to unify data in electronic health records (EHRs) and bring an integrated view of patient records to clinicians. We discuss our consumer focused innovation in dermatology to help guide search journeys for personalized information about skin conditions. Finally, we share a perspective on how to embed ethics and a concern for all patients into the development of AI. View details
Preview abstract Given a training data-set $\mathcal{S}$, and a reference data-set $\mathcal{T}$, we design a simple and efficient algorithm to reweigh the loss function such that the limiting distribution of the neural network weights that result from training on $\mathcal{S}$ approaches the limiting distribution that would have resulted by training on $\mathcal{T}$. Such reweighing can be used to correct for Train-Test distribution shift, when we don't have access to the labels of $\mathcal{T}$. It can also be used to perform (soft) multi-criteria optimization on neural nets, when we have access to the labels of $\mathcal{T}$, but $\mathcal{S}$ and $\mathcal{T}$ have few common points. As a motivating application, we train a graph neural net to recognize small molecule binders to MNK2 (a MAP Kinase, responsible for cell signaling) which are non-binders to MNK1 (a very similar protein), even in the absence of training data common to both data-sets. We are able to tune the reweighing parameters so that overall change in holdout loss is negligible, but the selectivity, i.e., the fraction of top 100 MNK2 binders that are MNK1 non-binders, increases from 54\% to 95\%, as a result of our reweighing. We expect the algorithm to be applicable in other settings as well, since we prove that when the metric entropy of the input data-sets is bounded, our random sampling based greedy algorithm outputs a close to optimal reweighing, i.e., the two invariant distributions of network weights will be provably close in total variation distance. View details
Unsupervised representation learning on high-dimensional clinical data improves genomic discovery and prediction
Babak Behsaz
Zachary Ryan Mccaw
Davin Hill
Robert Luben
Dongbing Lai
John Bates
Howard Yang
Tae-Hwi Schwantes-An
Yuchen Zhou
Anthony Khawaja
Andrew Carroll
Brian Hobbs
Michael Cho
Nature Genetics (2024)
Preview abstract Although high-dimensional clinical data (HDCD) are increasingly available in biobank-scale datasets, their use for genetic discovery remains challenging. Here we introduce an unsupervised deep learning model, Representation Learning for Genetic Discovery on Low-Dimensional Embeddings (REGLE), for discovering associations between genetic variants and HDCD. REGLE leverages variational autoencoders to compute nonlinear disentangled embeddings of HDCD, which become the inputs to genome-wide association studies (GWAS). REGLE can uncover features not captured by existing expert-defined features and enables the creation of accurate disease-specific polygenic risk scores (PRSs) in datasets with very few labeled data. We apply REGLE to perform GWAS on respiratory and circulatory HDCD—spirograms measuring lung function and photoplethysmograms measuring blood volume changes. REGLE replicates known loci while identifying others not previously detected. REGLE are predictive of overall survival, and PRSs constructed from REGLE loci improve disease prediction across multiple biobanks. Overall, REGLE contain clinically relevant information beyond that captured by existing expert-defined features, leading to improved genetic discovery and disease prediction. View details
Understanding metric-related pitfalls in image analysis validation
Annika Reinke
Lena Maier-Hein
Paul Jager
Shravya Shetty
Understanding Metrics Workgroup
Nature Methods (2024)
Preview abstract Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation. View details