Deep learning versus human graders for classifying diabetic retinopathy severity in a nationwide screening program

Dr. Paisan Raumviboonsuk
Dr. Peranut Chotcomwongse
Rajiv Raman
Sonia Phene
Kornwipa Hemarat
Mongkol Tadarati
Sukhum Silpa-Archa
Jirawut Limwattanayingyong
Chetan Rao
Oscar Kuruvilla
Jesse Jung
Jeffrey Tan
Surapong Orprayoon
Chawawat Kangwanwongpaisan
Ramase Sukumalpaiboon
Chainarong Luengchaichawang
Jitumporn Fuangkaew
Pipat Kongsap
Lamyong Chualinpha
Sarawuth Saree
Srirut Kawinpanitan
Korntip Mitvongsa
Siriporn Lawanasakol
Chaiyasit Thepchatri
Lalita Wongpichedchai
Lily Peng
Nature Partner Journal (npj) Digital Medicine (2019)

Abstract

Deep learning algorithms have been used to detect diabetic retinopathy (DR) with specialist-level accuracy. This study aims to validate one such algorithm on a large-scale clinical population, and compare the algorithm performance with that of human graders. A total of 25,326 gradable retinal images of patients with diabetes from the community-based, nationwide screening program of DR in Thailand were analyzed for DR severity and referable diabetic macular edema (DME). Grades adjudicated by a panel of international retinal specialists served as the reference standard. Relative to human graders, for detecting referable DR (moderate NPDR or worse), the deep learning algorithm had significantly higher sensitivity (0.97 vs. 0.74, p < 0.001), and a slightly lower specificity (0.96 vs. 0.98, p < 0.001). Higher sensitivity of the algorithm was also observed for each of the categories of severe or worse NPDR, PDR, and DME (p < 0.001 for all comparisons). The quadratic-weighted kappa for determination of DR severity levels by the algorithm and human graders was 0.85 and 0.78 respectively (p < 0.001 for the difference). Across different severity levels of DR for determining referable disease, deep learning significantly reduced the false negative rate (by 23%) at the cost of slightly higher false positive rates (2%). Deep learning algorithms may serve as a valuable tool for DR screening.