David Steiner

Health/Pathology team. Integration of molecular, laboratory, and imaging data. Background in molecular biology and molecular diagnostic test development, including analytical and clinical validation.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Health AI Developer Foundations
    Atilla Kiraly
    Sebastien Baur
    Kenneth Philbrick
    Fereshteh Mahvar
    Liron Yatziv
    Tiffany Chen
    Bram Sterling
    Nick George
    Fayaz Jamil
    Jing Tang
    Kai Bailey
    Akshay Goel
    Abbi Ward
    Lin Yang
    Shravya Shetty
    Daniel Golden
    Tim Thelin
    Rory Pilgrim
    Can "John" Kirmizi
    arXiv (2024)
    Preview abstract Robust medical Machine Learning (ML) models have the potential to revolutionize healthcare by accelerating clinical research, improving workflows and outcomes, and producing novel insights or capabilities. Developing such ML models from scratch is cost prohibitive and requires substantial compute, data, and time (e.g., expert labeling). To address these challenges, we introduce Health AI Developer Foundations (HAI-DEF), a suite of pre-trained, domain-specific foundation models, tools, and recipes to accelerate building ML for health applications. The models cover various modalities and domains, including radiology (X-rays and computed tomography), histopathology, dermatological imaging, and audio. These models provide domain specific embeddings that facilitate AI development with less labeled data, shorter training times, and reduced computational costs compared to traditional approaches. In addition, we utilize a common interface and style across these models, and prioritize usability to enable developers to integrate HAI-DEF efficiently. We present model evaluations across various tasks and conclude with a discussion of their application and evaluation, covering the importance of ensuring efficacy, fairness, and equity. Finally, while HAI-DEF and specifically the foundation models lower the barrier to entry for ML in healthcare, we emphasize the importance of validation with problem- and population-specific data for each desired usage setting. This technical report will be updated over time as more modalities and features are added. View details
    Preview abstract Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision–language modeling raise new oppor- tunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating interpretation across multiple slides, often making it difficult to create robust image–text pairs. As such, pathology reports remain a largely untapped source of supervision in computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at the patch-level. In this work, we develop a vision–language model based on the BLIP-2 framework using WSIs paired with curated text from pathology reports. This enables applications utilizing a shared image–text embedding space, such as text or image retrieval for finding cases of interest, as well as integration of the WSI encoder with a frozen large language model (LLM) for WSI-based generative text capabilities such as report generation or AI-in-the-loop interactions. We utilize a de-identified dataset of over 350,000 WSIs and diagnostic text pairs, spanning a wide range of diagnoses, procedure types, and tissue types. We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization (slide-level triaging). Model-generated text for WSIs was rated by pathologists as accurate, without clinically significant error or omission, for 78% of WSIs on average. This work demonstrates exciting potential capabilities for language-aligned WSI embeddings. View details
    Preview abstract Task-specific deep learning models in histopathology offer promising opportunities for improving diagnosis, clinical research, and precision medicine. However, development of such models is often limited by availability of high-quality data. Foundation models in histopathology that learn general representations across a wide range of tissue types, diagnoses, and magnifications offer the potential to reduce the data, compute, and technical expertise necessary to develop task-specific deep learning models with the required level of model performance. In this work, we describe the development and evaluation of foundation models for histopathology via self-supervised learning (SSL). We first establish a diverse set of benchmark tasks involving 17 unique tissue types and 12 unique cancer types and spanning different optimal magnifications and task types. Next, we use this benchmark to explore and evaluate histopathology-specific SSL methods followed by further evaluation on held out patch-level and weakly supervised tasks. We found that standard SSL methods thoughtfully applied to histopathology images are performant across our benchmark tasks and that domain-specific methodological improvements can further increase performance. Our findings reinforce the value of using domain-specific SSL methods in pathology, and establish a set of high quality foundation models to enable further research across diverse applications. View details
    Predicting lymph node metastasis from primary tumor histology and clinicopathologic factors in colorectal cancer using deep learning
    Fraser Tan
    Isabelle Flament-Auvigne
    Trissia Brown
    Markus Plass
    Robert Reihs
    Heimo Mueller
    Kurt Zatloukal
    Pema Richeson
    Lily Peng
    Craig Mermel
    Cameron Chen
    Saurabh Gombar
    Thomas Montine
    Jeanne Shen
    Nature Communications Medicine, 3 (2023), pp. 59
    Preview abstract Background: Presence of lymph node metastasis (LNM) influences prognosis and clinical decision-making in colorectal cancer. However, detection of LNM is variable and depends on a number of external factors. Deep learning has shown success in computational pathology, but has struggled to boost performance when combined with known predictors. Methods: Machine-learned features are created by clustering deep learning embeddings of small patches of tumor in colorectal cancer via k-means, and then selecting the top clusters that add predictive value to a logistic regression model when combined with known baseline clinicopathological variables. We then analyze performance of logistic regression models trained with and without these machine-learned features in combination with the baseline variables. Results: The machine-learned extracted features provide independent signal for the presence of LNM (AUROC: 0.638, 95% CI: [0.590, 0.683]). Furthermore, the machine-learned features add predictive value to the set of 6 clinicopathologic variables in an external validation set (likelihood ratio test, p < 0.00032; AUROC: 0.740, 95% CI: [0.701, 0.780]). A model incorporating these features can also further risk-stratify patients with and without identified metastasis (p < 0.001 for both stage II and stage III). Conclusion: This work demonstrates an effective approach to combine deep learning with established clinicopathologic factors in order to identify independently informative features associated with LNM. Further work building on these specific results may have important impact in prognostication and therapeutic decision making for LNM. Additionally, this general computational approach may prove useful in other contexts. View details
    Deep learning models for histologic grading of breast cancer and association with disease prognosis
    Trissia Brown
    Isabelle Flament
    Fraser Tan
    Yuannan Cai
    Kunal Nagpal
    Emad Rakha
    David J. Dabbs
    Niels Olson
    James H. Wren
    Elaine E. Thompson
    Erik Seetao
    Carrie Robinson
    Melissa Miao
    Fabien Beckers
    Lily Hao Yi Peng
    Craig Mermel
    Cameron Chen
    npj Breast Cancer (2022)
    Preview abstract Histologic grading of breast cancer involves review and scoring of three well-established morphologic features: mitotic count, nuclear pleomorphism, and tubule formation. Taken together, these features form the basis of the Nottingham Grading System which is used to inform breast cancer characterization and prognosis. In this study, we developed deep learning models to perform histologic scoring of all three components using digitized hematoxylin and eosin-stained slides containing invasive breast carcinoma. We then evaluated the prognostic potential of these models using an external test set and progression free interval as the primary outcome. The individual component models performed at or above published benchmarks for algorithm-based grading approaches and achieved high concordance rates in comparison to pathologist grading. Prognostic performance of histologic scoring provided by the deep learning-based grading was on par with that of pathologists performing review of matched slides. Additionally, by providing scores for each component feature, the deep-learning based approach provided the potential to identify the grading components contributing most to prognostic value. This may enable optimized prognostic models as well as opportunities to improve access to consistent grading and better understand the links between histologic features and clinical outcomes in breast cancer. View details
    Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge
    Wouter Bulten
    Kimmo Kartasalo
    Po-Hsuan Cameron Chen
    Peter Ström
    Hans Pinckaers
    Kunal Nagpal
    Yuannan Cai
    Hester van Boven
    Robert Vink
    Christina Hulsbergen-van de Kaa
    Jeroen van der Laak
    Mahul B. Amin
    Andrew J. Evans
    Theodorus van der Kwast
    Robert Allan
    Peter A. Humphrey
    Henrik Grönberg
    Hemamali Samaratunga
    Brett Delahunt
    Toyonori Tsuzuki
    Tomi Häkkinen
    Lars Egevad
    Maggie Demkin
    Sohier Dane
    Fraser Tan
    Masi Valkonen
    Lily Peng
    Craig H. Mermel
    Pekka Ruusuvuori
    Geert Litjens
    Martin Eklund
    the PANDA challenge consortium
    Nature Medicine, 28 (2022), pp. 154-163
    Preview abstract Artificial intelligence (AI) has shown promise for diagnosing prostate cancer in biopsies. However, results have been limited to individual studies, lacking validation in multinational settings. Competitions have been shown to be accelerators for medical imaging innovations, but their impact is hindered by lack of reproducibility and independent validation. With this in mind, we organized the PANDA challenge—the largest histopathology competition to date, joined by 1,290 developers—to catalyze development of reproducible AI algorithms for Gleason grading using 10,616 digitized prostate biopsies. We validated that a diverse set of submitted algorithms reached pathologist-level performance on independent cross-continental cohorts, fully blinded to the algorithm developers. On United States and European external validation sets, the algorithms achieved agreements of 0.862 (quadratically weighted κ, 95% confidence interval (CI), 0.840–0.884) and 0.868 (95% CI, 0.835–0.900) with expert uropathologists. Successful generalization across different patient populations, laboratories and reference standards, achieved by a variety of algorithmic approaches, warrants evaluating AI-based Gleason grading in prospective clinical trials. View details
    Predicting prostate cancer specific-mortality with artificial intelligence-based Gleason grading
    Kunal Nagpal
    Matthew Symonds
    Melissa Moran
    Markus Plass
    Robert Reihs
    Farah Nader
    Fraser Tan
    Yuannan Cai
    Trissia Brown
    Isabelle Flament
    Mahul Amin
    Martin Stumpe
    Heimo Muller
    Peter Regitnig
    Andreas Holzinger
    Lily Hao Yi Peng
    Cameron Chen
    Kurt Zatloukal
    Craig Mermel
    Communications Medicine (2021)
    Preview abstract Background. Gleason grading of prostate cancer is an important prognostic factor, but suffers from poor reproducibility, particularly among non-subspecialist pathologists. Although artificial intelligence (A.I.) tools have demonstrated Gleason grading on-par with expert pathologists, it remains an open question whether and to what extent A.I. grading translates to better prognostication. Methods. In this study, we developed a system to predict prostate cancer-specific mortality via A.I.-based Gleason grading and subsequently evaluated its ability to risk-stratify patients on an independent retrospective cohort of 2807 prostatectomy cases from a single European center with 5–25 years of follow-up (median: 13, interquartile range 9–17). Results. Here, we show that the A.I.’s risk scores produced a C-index of 0.84 (95% CI 0.80–0.87) for prostate cancer-specific mortality. Upon discretizing these risk scores into risk groups analogous to pathologist Grade Groups (GG), the A.I. has a C-index of 0.82 (95% CI 0.78–0.85). On the subset of cases with a GG provided in the original pathology report (n = 1517), the A.I.’s C-indices are 0.87 and 0.85 for continuous and discrete grading, respectively, compared to 0.79 (95% CI 0.71–0.86) for GG obtained from the reports. These represent improvements of 0.08 (95% CI 0.01–0.15) and 0.07 (95% CI 0.00–0.14), respectively. Conclusions. Our results suggest that A.I.-based Gleason grading can lead to effective risk stratification, and warrants further evaluation for improving disease management. View details
    Determining Breast Cancer Biomarker Status and Associated Morphological Features Using Deep Learning
    Paul Gamble
    Harry Wang
    Fraser Tan
    Melissa Moran
    Trissia Brown
    Isabelle Flament
    Emad A. Rakha
    Michael Toss
    David J. Dabbs
    Peter Regitnig
    Niels Olson
    James H. Wren
    Carrie Robinson
    Lily Peng
    Craig Mermel
    Cameron Chen
    Nature Communications Medicine (2021)
    Preview abstract Background: Breast cancer management depends on biomarkers including estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 (ER/PR/HER2). Though existing scoring systems are widely used and well-validated, they can involve costly preparation and variable interpretation. Additionally, discordances between histology and expected biomarker findings can prompt repeat testing to address biological, interpretative, or technical reasons for unexpected results. Methods: We developed three independent deep learning systems (DLS) to directly predict ER/PR/HER2 status for both focal tissue regions (patches) and slides using hematoxylin-andeosin-stained (H&E) images as input. Models were trained and evaluated using pathologist annotated slides from three data sources. Areas under the receiver operator characteristic curve (AUCs) were calculated for test sets at both a patch-level (>135 million patches, 181 slides) and slide-level (n = 3274 slides, 1249 cases, 37 sites). Interpretability analyses were performed using Testing with Concept Activation Vectors (TCAV), saliency analysis, and pathologist review of clustered patches. Results: The patch-level AUCs are 0.939 (95%CI 0.936–0.941), 0.938 (0.936–0.940), and 0.808 (0.802–0.813) for ER/PR/HER2, respectively. At the slide level, AUCs are 0.86 (95% CI 0.84–0.87), 0.75 (0.73–0.77), and 0.60 (0.56–0.64) for ER/PR/HER2, respectively. Interpretability analyses show known biomarker-histomorphology associations including associations of low-grade and lobular histology with ER/PR positivity, and increased inflammatory infiltrates with triple-negative staining. Conclusions: This study presents rapid breast cancer biomarker estimation from routine H&E slides and builds on prior advances by prioritizing interpretability of computationally learned features in the context of existing pathological knowledge. View details
    Interpretable Survival Prediction for Colorectal Cancer using Deep Learning
    Melissa Moran
    Markus Plass
    Robert Reihs
    Fraser Tan
    Isabelle Flament
    Trissia Brown
    Peter Regitnig
    Cameron Chen
    Apaar Sadhwani
    Bob MacDonald
    Benny Ayalew
    Lily Hao Yi Peng
    Heimo Mueller
    Zhaoyang Xu
    Martin Stumpe
    Kurt Zatloukal
    Craig Mermel
    npj Digital Medicine (2021)
    Preview abstract Deriving interpretable prognostic features from deep-learning-based prognostic histopathology models remains a challenge. In this study, we developed a deep learning system (DLS) for predicting disease-specific survival for stage II and III colorectal cancer using 3652 cases (27,300 slides). When evaluated on two validation datasets containing 1239 cases (9340 slides) and 738 cases (7140 slides), respectively, the DLS achieved a 5-year disease-specific survival AUC of 0.70 (95% CI: 0.66–0.73) and 0.69 (95% CI: 0.64–0.72), and added significant predictive value to a set of nine clinicopathologic features. To interpret the DLS, we explored the ability of different human-interpretable features to explain the variance in DLS scores. We observed that clinicopathologic features such as T-category, N-category, and grade explained a small fraction of the variance in DLS scores (R2 = 18% in both validation sets). Next, we generated human-interpretable histologic features by clustering embeddings from a deep-learning-based image-similarity model and showed that they explained the majority of the variance (R2 of 73–80%). Furthermore, the clustering-derived feature most strongly associated with high DLS scores was also highly prognostic in isolation. With a distinct visual appearance (poorly differentiated tumor cell clusters adjacent to adipose tissue), this feature was identified by annotators with 87.0–95.5% accuracy. Our approach can be used to explain predictions from a prognostic deep learning model and uncover potentially-novel prognostic features that can be reliably identified by people for future validation studies. View details
    Comparative analysis of machine learning approaches to classify tumor mutation burden in lung adenocarcinoma using histopathology images
    Apaar Sadhwani
    Huang-Wei Chang
    Ali Behrooz
    Trissia Brown
    Isabelle Flament
    Hardik Patel
    Robert Findlater
    Vanessa Velez
    Fraser Tan
    Kamilla Marta Tekiela
    Eunhee Yi
    Craig Mermel
    Debra Hanks
    Cameron Chen
    Kimary Kulig
    Cory Batenchuk
    Peter Cimermancic
    Scientific Reports (2021)
    Preview abstract Both histologic subtype and tumor mutation burden (TMB) represent important biomarkers in lung cancer, with implications for patient prognosis as well as treatment decisions. Typically, TMB is evaluated by comprehensive genomic profiling but this requires use of finite tissue specimens as well as costly and time consuming laboratory processes. Histologic subtype classification represents an established component of lung adenocarcinoma histopathology, but it can be a challenging task with substantial inter-pathologist variability. Here we developed a deep learning system to both classify histologic patterns in lung adenocarcinoma and predict TMB status using Hematoxylin and Eosin (H&E) stained whole slide images. We first trained a convolutional neural network to comprehensively infer histologic subtypes across whole slide images of lung cancer resection specimens. This model achieved a patch-level area under the receiver operating characteristic curve (AUROC) of 0.78-0.98 for the individual features on a test including TCGA slides and 50 external dataset slides. We then integrated the output of this model with clinico-demographic data to develop an interpretable model for TMB classification and evaluated the end-to-end system on 172 held out cases from TCGA, achieving an AUROC of 0.71 [95%CI 0.62-0.79]. Finally we also developed a weakly supervised model for TMB classification, finding that our histologic subtype-based approach achieves similar performance (AUROC of 0.72 95% CI XXX) to the weakly supervised approach. These results suggest interpretable approaches for molecular biomarker prediction based on established histologic patterns are feasible and comparable to more difficult to explain deep learning approaches. View details