Abstract
The Purpose of the study was to develop a deep residual learning algorithm to screen for glaucoma from fundus photography and measure its diagnostic performance compared to Residents in Ophthalmology. A training dataset consisted of 1,364 color fundus photographs with glaucomatous indications and 1,768 color fundus photographs without glaucomatous features. A testing dataset consisted of 60 eyes of 60 glaucoma patients and 50 eyes of 50 normal subjects. Using the training dataset, a deep learning algorithm known as Deep Residual Learning for Image Recognition (ResNet) was developed to discriminate glaucoma, and its diagnostic accuracy was validated in the testing dataset, using the area under the receiver operating characteristic curve (AROC). The Deep Residual Learning for Image Recognition was constructed using the training dataset and validated using the testing dataset. The presence of glaucoma in the testing dataset was also confirmed by three Residents in Ophthalmology. The deep learning algorithm achieved significantly higher diagnostic performance compared to Residents in Ophthalmology; with ResNet, the AROC from all testing data was 96.5 (95% confidence interval [CI]: 93.5 to 99.6)% while the AROCs obtained by the three Residents were between 72.6% and 91.2%.
Similar content being viewed by others
Introduction
Bilateral blindness was estimated to be present in 9.4 million people with glaucoma in 2010, and this number is expected to rise to 11.2 million people in 20201. Glaucoma is an irreversible disease and the second most common cause of blindness worldwide1. Early diagnosis of glaucoma is hugely important for preventing blindness. In glaucoma, morphological changes at the optic disc occur in typical patterns2. Evaluation of the optic nerve head (ONH) and retinal nerve fiber layer (RNFL) around the optic disc is very important for accurate and early diagnosis of glaucoma since structural changes may precede measurable visual field (VF) loss3. With the development of imaging devices, such as optical coherence tomography (OCT)4, the Heidelberg Retina Tomograph (HRT, Heidelberg Engineering GmbH, Heidelberg, Germany) and scanning laser polarimetry (GDx: Carl Zeiss Meditec, Dublin, CA), it is possible to measure glaucomatous structural changes quantitatively and in great detail. However, a considerable limitation of these ‘high-tech’ imaging devices is that they are usually available only at specialist eye clinics or hospitals; consequently, glaucoma sufferers – who have not visited these facilities – can persist without a diagnosis for many years. Furthermore, these imaging devices are usually unavailable in poorer nations.
The two-dimensional fundus photograph is a basic ophthalmological screening tool. In Japan and many other countries, the screening of ophthalmological diseases, including glaucoma, is based on expert interpretation of two-dimensional fundus photographs, and more high-tech imaging devices are generally not used. Thus, two-dimensional fundus photography remains a key instrument to prevent blindness through early detection of glaucoma. One of the problems with such fundus photography is that diagnosis is currently based on subjective judgement. Nonetheless, optic disc morphologic information quantified from fundus photographs are highly correlated with structural measurements obtained with HRT and GDx5.
The development of deep learning methods represents a revolutionary advance in imaging recognition research6. Deep learning methods are similar to artificial neural networks, which process information via interconnected neurons, however, deep learning methods have many ‘hidden layers’ which become computable in conjunction with a feature extractor. The feature extractor transforms raw data into a suitable feature vector, which can identify patterns in the input7. The purpose of the current study was to develop a deep residual learning algorithm to screen for glaucoma from fundus photographs, and to validate its diagnostic performance using an independent dataset.
A recent study suggested the usefulness of applying a deep learning method to diagnose glaucoma8,9, however, it used a simple convolutional neural network (CNN), whereas more powerful deep learning methods, such as the Deep Residual Learning for Image Recognition (ResNet)Diagnosis by Residents in Ophthalmology The fundus photographs of the testing dataset were reviewed by Residents in Ophthalmology (A: first year in Ophthalmology residency, B: third year in Ophthalmology residency, C: fourth year in Ophthalmology residency). All of the fundus photographs were reviewed, masking other clinical information. Each Resident made a diagnosis independent to other Residents. Three-fold cross validation was performed; the training dataset was divided into three equally sized subsets and the deep learning algorithm was trained using two of the three arms, diagnostic accuracy was then calculated in the remaining arm. The process was iterated three times so that each of the three arms was used as a validation dataset once. Independent validation was carried out using the testing dataset. The deep learning algorithm was built using all data in the training dataset and the area under the receiver operating characteristic curve (AROC) was calculated. AROCs were also calculated separately for the non-highly myopic and highly myopic eyes (i.e., between the G and N groups and between the mG and mN groups). Sensitivity was calculated at specificity equal to 95% in all of the analyses. AROCs were also obtained based on the diagnoses of glaucoma from the three Residents in Ophthalmology. As a further comparison, AROC values were calculated using: (i) a CNN with 16 layers, similar to VGG1624, (ii) a support vector machine25 and (iii) a Random Forest26. The details of each method follow; CNN with 16 layers, similar to VGG16. Support vector machine: Radial Basis Function, Penalty parameter = 1.0. Random Forest: number of trees = 10,000, criterion = Gini index, minimum number of samples required to split an internal node = 2, The minimum number of samples required to be at a leaf node = 1. All statistical analyses were carried out using the statistical programming language Python (ver. 2.7.9, Python Software Foundation, Beaverton, US). AROCs were compared using DeLong’s method27. Benjamini’s method28 was used to correct P values for the problem of multiple testing.Statistical analysis
Result
The testing dataset consisted of (i) 33 eyes of 33 non-highly myopic glaucoma patients (G group), (ii) 28 eyes of 28 highly myopic glaucoma patients (mG group), (iii) 27 eyes of 27 non-highly myopic normative subjects (N group) and (iv) 22 eyes of 22 highly myopic normative subjects (mN group). Demographic data of the subjects are summarized in Table 1.
The AROC values obtained with the internal verification are shown in Table 2. The diagnostic accuracy varied between 94.2 and 96.0%.
Figure 1 shows the structure of the deep learning algorithm (ResNet) with various parameters (Table 3). Figure 2 shows the receiver operating characteristic curve obtained with all data in the testing dataset. The AROC with ResNet was 96.5 (95% confidence interval [CI]: 93.5 to 99.6)%. The AROCs obtained from the three Residents in Ophthalmology are also shown; their AROCs were A: 72.6 (95% CI: 64.1 to 81.1), B: 87.7 (95% CI: 82.3 to 93.2) and C: 91.2 (95% CI: 85.9 to 96.5)%. The AROC with ResNet was significantly larger than the AROCs of the Residents in Ophthalmology A and B (both p < 0.001, DeLong’s method with adjustment for multiple comparisons). There was not a significant difference between the AROC of ResNet and the AROC of the Resident in Ophthalmology C (p = 0.077).
Figure 3 shows the receiver operating characteristic curve obtained with the G and N groups in the testing dataset. The AROC with ResNet was 97.1 (95% confidence interval [CI]: 93.3 to 100.0)%. The AROCs obtained from the three Residents in Ophthalmology are also shown; their AROCs were A: 77.4 (95% CI: 67.0 to 87.9) and B: 84.9 (95% CI: 76.9 to 92.8)% (p values: < 0.001 and 0.0013, respectively), but not with C: 93.7 (95% CI: 86.8 to 99.8)% (p = 0.15). The AROC with ResNet was significantly larger than the AROCs of the Residents in Ophthalmology A and B (p = 0.0014 and 0.0026, respectively). There was not a significant difference between the AROC of ResNet and the AROC of the Resident in Ophthalmology C (p = 0.29).
Figure 4 shows the receiver operating characteristic curve obtained with the mG and mN groups in the testing dataset. The AROC with ResNet was 96.4 (95% CI: 92.0 to 100.0)%. The AROCs obtained from the three Residents in Ophthalmology are also shown; A: 66.6 (95% CI: 53.4 to 79.7), B: 91.2 (95% CI: 83.9 to 98.3), C: 88.8 (95% CI: 80.3 to 97.3)%. The AROC with ResNet was significantly larger than the AROC of the Resident in Ophthalmology A (p < 0.001). There was not a significant difference between the AROC with ResNet and the AROCs of the Residents in Ophthalmology B and C (p = 0.10 and 0.072, respectively).
The AROC values associated with the other algorithms are shown in Table 4; these varied between 66.6 (95% CI: 53.4 to 79.7)% with the Support Vector Machine and 91.2 (95% CI: 83.5 to 99.0)% with a CNN with 16 layers, similar to VGG16.
Discussion
A deep residual learning algorithm to screen for glaucoma from fundus photographs was developed using a training dataset that consisted of 1,364 eyes with open angle glaucoma and 1,768 eyes of normative subjects. The diagnostic performance of this algorithm was validated using independent testing datasets. The AROC of the deep residual learning algorithm was 96.5% with all eyes, 97.1% between the G and N groups, and 96.4% between the mG and mN groups. These AROC values tended to be significantly larger than those from Residents in Ophthalmology. We also investigated the diagnostic performance of other deep learning models and machine learning methods, however, these algorithms resulted in much lower AROC values (see Table 4). As a scientific merit, the current results have shown the modern powerful deep learning method of the ResNet10 enabled an accurate diagnosis of glaucoma from fundus photographs. Diagnosing glaucoma in highly myopic eyes is a challenging task, because of the morphological difference from those of non-highly myopic eyes15,16, however the current results suggested the constructed algorithm had a high diagnostic power in such eyes.
Deep learning methods to diagnose disease from fundus photographs have been reported previously. Gulshan et al. developed a CNN, trained with 128,175 fundus photographs, to detect diabetic retinopathy29. The AROC of this algorithm was 99%. Takahashi et al. applied the GoogLeNet model to the same diagnostic problem and achieved 81% accuracy30. The task of glaucoma detection may be more challenging than the diagnosis of diabetic retinopathy, since diagnosis of diabetic retinopathy is based on abnormal retinal features, such as hemorrhage, microaneurysm and exudates, whereas the diagnosis of glaucoma relies on the estimation of subtle changes in the shape of the optic disc. Such alterations are better assessed using stereo-photographs, but two-dimensional fundus photographs were used in the current study, making the task even more challenging. Nonetheless, the deep residual learning model achieved very good discrimination with an AROC between 96.4 and 97.1%. It is difficult to directly compare the diagnostic performance of the current deep learning algorithm with those in recent studies8,9 since the diagnosis of glaucoma depends on many factors, including the stage of glaucoma and refractive status of the eye. In the current study, the ResNet algorithm was trained using a much smaller number of eyes (1,364 color fundus photographs with glaucomatous indications and 1,768 color fundus photographs without glaucomatous features) than in previous studies (approximately 120,000 and 40,000 fundus photographs), however, its diagnostic performance was equal to, or superior to, Residents in Ophthalmology, both in non-highly myopic and highly myopic eyes. It should be noted that high myopia was the top reason for false negative classification in8, despite the large size of the training dataset (approximately 40,000 fundus photographs). In contrast, our results suggested an accurate diagnosis can be obtained in highly myopic eyes.
The merits of applying machine learning methods to diagnose glaucoma have been widely reported. We previously applied the Random Forests method to diagnose glaucoma based on OCT measurements. As a result, the AROC to discriminate glaucomatous (from early to advanced glaucoma cases) from normative eyes was 98.5%31. We also reported that the AROC of this approach was 93.0% when discriminating early stage glaucoma patients and normative eyes32. Following great successes of deep learning methods for discrimination tasks in various fields, the application of these methods have just begun in the field of glaucoma. Indeed we have very recently reported the merit of applying deep learning methods to predict visual field sensitivity from OCT33. Although a more detailed investigation of glaucomatous retinal damage can be made using OCT, compared to fundus photography, the potential impact of the current algorithm as a screening tool cannot be exaggerated; fundus photography is commonly used at screening centers, opticians and internal medicine clinics.
The ResNet deep network (18 convolutional layers were used in the current study) enables the extraction of complex and detailed features from images. To avoid the model identifying features from other parts of the retina, outside the optic disc, images were cropped before training the model. This also reduced the model learning duration time. However, other glaucomatous findings might be observed outside the optic disc on fundus photographs, such as nerve fiber layer defects and optic disc hemorrhage. A future study should investigate whether considering this information improves diagnostic accuracy. Furthermore, it is possible that a deep learning algorithm trained on stereoscopic fundus photographs or OCT images offers better discrimination than the one built here on two-dimensional fundus photographs. The principal purpose of the current study was to build an automated screening tool for glaucoma with high discriminatory power, which could be used in the majority of screening facilities. The significant disadvantage of a screening tool based on stereoscopic fundus photography or OCT is that its use would be very limited since these technologies are equipped in only a limited number of ophthalmological facilities, while two-dimensional fundus photography is much more widely available. Nonetheless, this does not deny the value of a screening tool based on OCT or stereoscopic fundus photography, and a future study should investigate this possibility. A limitation of the current study concerns photograph selection, photographs with features that could interfere with an expert diagnosis of glaucoma were omitted. Excluding these images means further testing of the algorithm is essential to measure its performance as a screening tool in a “real world” setting. However, identifying low quality images is usually much easier than automatically screening for glaucoma.
In conclusion, a deep residual learning algorithm was developed to automatically screen for glaucoma in fundus photographs. The algorithm had a high diagnostic ability in non-highly myopic and highly myopic eyes.
References
Quigley, H. A. & Broman, A. T. The number of people with glaucoma worldwide in 2010 and 2020. Br J Ophthalmol 90, 262–267 (2006).
Hitchings, R. A. & Spaeth, G. L. The optic disc in glaucoma. I: Classification. Br J Ophthalmol 60, 778–785 (1976).
Quigley, H. A., Katz, J., Derick, R. J., Gilbert, D. & Sommer, A. An evaluation of optic disc and nerve fiber layer examinations in monitoring progression of early glaucoma damage. Ophthalmology 99, 19–28 (1992).
Huang, D. et al. Optical coherence tomography. Science 254, 1178–1181 (1991).
Saito, H., Tsutsumi, T., Iwase, A., Tomidokoro, A. & Araie, M. Correlation of disc morphology quantified on stereophotographs to results by Heidelberg Retina Tomograph II, GDx variable corneal compensation, and visual field tests. Ophthalmology 117, 282–289 (2010).
Hinton, G. E., Osindero, S. & Teh, Y. W. A fast learning algorithm for deep belief nets. Neural Comput 18, 1527–1554 (2006).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2000).
Ting, D. S. W. et al. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes. JAMA 318, 2211–2223 (2017).
Li, Z. et al. Efficacy of a Deep Learning System for Detecting Glaucomatous Optic Neuropathy Based on Color Fundus Photographs. Ophthalmology (2018).
He K, Zhang X, Ren S & Sun J. Deep Residual Learning for Image Recognition. ar**v:1512.03385ld (2015).
Mitchell, P., Hourihan, F., Sandbach, J. & Wang, J. J. The relationship between glaucoma and myopia: the Blue Mountains Eye Study. Ophthalmology 106, 2010–2015 (1999).
Suzuki, Y. et al. Risk factors for open-angle glaucoma in a Japanese population: the Tajimi Study. Ophthalmology 113, 1613–1617 (2006).
Xu, L., Wang, Y., Wang, S. & Jonas, J. B. High myopia and glaucoma susceptibility the Bei**g Eye Study. Ophthalmology 114, 216–220 (2007).
Perera, S. A. et al. Refractive error, axial dimensions, and primary open-angle glaucoma: the Singapore Malay Eye Study. Archives of ophthalmology 128, 900–905 (2010).
How, A. C. et al. Population prevalence of tilted and torted optic discs among an adult Chinese population in Singapore: the Tanjong Pagar Study. Archives of ophthalmology 127, 894–899 (2009).
Samarawickrama, C. et al. Myopia-related optic disc and retinal changes in adolescent children from singapore. Ophthalmology 118, 2050–2057 (2011).
Rudnicka, A. R., Owen, C. G., Nightingale, C. M., Cook, D. G. & Whincup, P. H. Ethnic differences in the prevalence of myopia and ocular biometry in 10- and 11-year-old children: the Child Heart and Health Study in England (CHASE). Investigative ophthalmology & visual science 51, 6270–6276 (2010).
Sawada, A., Tomidokoro, A., Araie, M., Iwase, A. & Yamamoto, T. Refractive errors in an elderly Japanese population: the Tajimi study. Ophthalmology 115, 363–370 e363 (2008).
Iwase, A. et al. The prevalence of primary open-angle glaucoma in Japanese: the Tajimi Study. Ophthalmology 111, 1641–1648 (2004).
Song, W. et al. Prevalence of glaucoma in a rural northern china adult population: a population-based survey in kailu county, inner mongolia. Ophthalmology 118, 1982–1988 (2011).
Liang, Y. B. et al. Prevalence of primary open angle glaucoma in a rural adult Chinese population: the Handan eye study. Invest Ophthalmol Vis Sci 52, 8250–8257 (2011).
Japan Glaucoma Society, http://www.ryokunaisho.jp/english/guidelines.html.
Duda, R. O. & Hart, P. E. Use of the Hough Transformation to Detect Lines and Curves in Pictures. Comm. ACM 15, 11–15 (1972).
Simonyan K & Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. ar**v:1409.1556 (2014).
Cristianini N & Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. (Cambridge University Press, 2000).
Breiman, L. Random Forests. Machine Learning 45, 5–32 (2001).
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289–300 (1995).
Gulshan, V. et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA 316, 2402–2410 (2016).
Takahashi, H., Tampo, H., Arai, Y., Inoue, Y. & Kawashima, H. Applying artificial intelligence to disease staging: Deep learning for improved staging of diabetic retinopathy. PLoS One 12, e0179790 (2017).
Yoshida, T. et al. Discriminating between glaucoma and normal eyes using optical coherence tomography and the ‘Random Forests’ classifier. PLoS One 9, e106117 (2014).
Asaoka, R. et al. Validating the Usefulness of the “Random Forests” Classifier to Diagnose Early Glaucoma With Optical Coherence Tomography. Am J Ophthalmol 174, 95–103 (2017).
Asaoka, R., Murata, H., Iwase, A. & Araie, M. Detecting Preperimetric Glaucoma with Standard Automated Perimetry Using a Deep Learning Classifier. Ophthalmology 123, 1974–1980 (2016).
Acknowledgements
Supported by Grant 17K11418 from the Ministry of Education, Culture, Sports, Science and Technology of Japan and the Translational Research program; Strategic PRomotion for practical application of INnovative medical Technology, TR-SPRINT, from Japan Agency for Medical Research and Development, AMED. This research is (partially) supported by The Translational Research program; Strategic PRomotion for practical application of INnovative medical Technology, TR-SPRINT, from Japan Agency for Medical Research and Development, AMED, Grant 17K11418 from the Ministry of Education, Culture, Sports, Science, and Technology of Japan and Japan Science and Technology Agency (JST) CREST JPMJCR1304.
Author information
Authors and Affiliations
Contributions
M.T., Y.F. and R.A. prepared the material. M.T. and R.A. wrote the main manuscript text prepared figures. N.S., M.T., K.M., Y.F., M.M., H.M. and R.A. reviewed the manuscript.
Corresponding author
Ethics declarations
Competing Interests
Mr Naoto Shibata, Dr Masaki Tanito, Mr Keita Mitsuhashi, Dr Hiroshi Murata, Dr Ryo Asaoka reported that they are coinventors on a patent for the deep learning system used in this study (Tokugan 2017-196870); potential conflicts of interests are managed according to institutional policies of the University of Tokyo.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shibata, N., Tanito, M., Mitsuhashi, K. et al. Development of a deep residual learning algorithm to screen for glaucoma from fundus photography. Sci Rep 8, 14665 (2018). https://doi.org/10.1038/s41598-018-33013-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-018-33013-w
- Springer Nature Limited
Keywords
This article is cited by
-
Evaluation of an offline, artificial intelligence system for referable glaucoma screening using a smartphone-based fundus camera: a prospective study
Eye (2024)
-
Deep learning-based optic disc classification is affected by optic-disc tilt
Scientific Reports (2024)
-
An Enhanced RNN-LSTM Model for Fundus Image Classification to Diagnose Glaucoma
SN Computer Science (2024)
-
Insights into artificial intelligence in myopia management: from a data perspective
Graefe's Archive for Clinical and Experimental Ophthalmology (2024)
-
Artificial intelligence in glaucoma: opportunities, challenges, and future directions
BioMedical Engineering OnLine (2023)