Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging

Laura Anne Culp
Jan Freyberg
Basil Mustafa
Sebastien Baur
Simon Kornblith
Ting Chen
Patricia MacWilliams
Sara Mahdavi
Megan Zoë Walker
Aaron Loh
Cameron Chen
Scott Mayer McKinney
Jim Winkens
Zach William Beaver
Fiona Keleher Ryan
Mozziyar Etemadi
Umesh Telang
Lily Hao Yi Peng
Geoffrey Everest Hinton
Neil Houlsby
Mohammad Norouzi
Nature Biomedical Engineering (2023)

Abstract

Machine-learning models for medical tasks can match or surpass the performance of clinical experts. However, in settings differing from those of the training dataset, the performance of a model can deteriorate substantially. Here we report a representation-learning strategy for machine-learning models applied to medical-imaging tasks that mitigates such ‘out of distribution’ performance problem and that improves model robustness and training efficiency. The strategy, which we named REMEDIS (for ‘Robust and Efficient Medical Imaging with Self-supervision’), combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization. We show the utility of REMEDIS in a range of diagnostic-imaging tasks covering six imaging domains and 15 test datasets, and by simulating three realistic out-of-distribution scenarios. REMEDIS improved in-distribution diagnostic accuracies up to 11.5% with respect to strong supervised baseline models, and in out-of-distribution settings required only 1–33% of the data for retraining to match the performance of supervised models retrained using all available data. REMEDIS may accelerate the development lifecycle of machine-learning models for medical imaging.