Comparative analysis of machine learning approaches to classify tumor mutation burden in lung adenocarcinoma using histopathology images

Apaar Sadhwani
Huang-Wei Chang
Ali Behrooz
Trissia Brown
Isabelle Flament
Hardik Patel
Robert Findlater
Vanessa Velez
Fraser Tan
Kamilla Marta Tekiela
Eunhee Yi
Craig Mermel
Debra Hanks
Cameron Chen
Kimary Kulig
Cory Batenchuk
Peter Cimermancic
Scientific Reports (2021)

Abstract

Both histologic subtype and tumor mutation burden (TMB) represent important biomarkers in lung cancer, with implications for patient prognosis as well as treatment decisions. Typically, TMB is evaluated by comprehensive genomic profiling but this requires use of finite tissue specimens as well as costly and time consuming laboratory processes. Histologic subtype classification represents an established component of lung adenocarcinoma histopathology, but it can be a challenging task with substantial inter-pathologist variability. Here we developed a deep learning system to both classify histologic patterns in lung adenocarcinoma and predict TMB status using Hematoxylin and Eosin (H&E) stained whole slide images. We first trained a convolutional neural network to comprehensively infer histologic subtypes across whole slide images of lung cancer resection specimens. This model achieved a patch-level area under the receiver operating characteristic curve (AUROC) of 0.78-0.98 for the individual features on a test including TCGA slides and 50 external dataset slides. We then integrated the output of this model with clinico-demographic data to develop an interpretable model for TMB classification and evaluated the end-to-end system on 172 held out cases from TCGA, achieving an AUROC of 0.71 [95%CI 0.62-0.79]. Finally we also developed a weakly supervised model for TMB classification, finding that our histologic subtype-based approach achieves similar performance (AUROC of 0.72 95% CI XXX) to the weakly supervised approach. These results suggest interpretable approaches for molecular biomarker prediction based on established histologic patterns are feasible and comparable to more difficult to explain deep learning approaches.