Middle East African Journal of Ophthalmology

: 2021  |  Volume : 28  |  Issue : 2  |  Page : 81--86

Validation of artificial intelligence algorithm in the detection and staging of diabetic retinopathy through fundus photography: An automated tool for detection and grading of diabetic retinopathy

Bhargavi Pawar1, Suneetha N Lobo1, Mary Joseph1, Sangeetha Jegannathan1, Hariprasad Jayraj2,  
1 Department of Ophthalmology, St. Johns Medical College, Bengaluru, Karnataka, India
2 Department of Ophthalmology, LEBEN CARE Technologies Pvt. Ltd, Singapore

Correspondence Address:
Dr. Bhargavi Pawar
Department of Ophthalmology, St. Johns Medical College, Bengaluru, Karnataka


PURPOSE: Diabetic retinopathy (DR) is one of the leading causes of vision loss globally, and early detection plays a significant role in the prognosis. Several studies have been done on the single field fundus photography and artificial intelligence (AI) in DR screening using standardized data sets in urban outpatient settings. This study was carried out to validate AI algorithm in the detection of DR severity using fundus photography in real-time rural setting. METHODS: This cross-sectional study was carried out among 138 patients who underwent routine ophthalmic examination, irrespective of their diabetic status. The participants were subjected to a single field color fundus photography using nonmydriatic fundus camera. The images acquired were processed by AI algorithm for image quality, presence and refer ability of DR. The results were graded by four ophthalmologists. Interobserver variability between the four observers was also calculated. RESULTS: Of the 138 patients, 26 patients (18.84%) had some stage of DR, represented by 47 images (17.03%) positive for signs of DR. All 26 patients were immoderate or severe stage. About 6.5% of the images were considered as not gradable due to poor optical quality. The average agreement between pairs of the four graders was 95.16% for referable DR (RDR). The AI showed 100% sensitivity in detecting DR while the specificity for RDR was 91.47%. CONCLUSION: AI has shown excellent sensitivity and specificity in RDR detection, at par with the performance of individual ophthalmologists and is an invaluable tool for DR screening.

How to cite this article:
Pawar B, Lobo SN, Joseph M, Jegannathan S, Jayraj H. Validation of artificial intelligence algorithm in the detection and staging of diabetic retinopathy through fundus photography: An automated tool for detection and grading of diabetic retinopathy.Middle East Afr J Ophthalmol 2021;28:81-86

How to cite this URL:
Pawar B, Lobo SN, Joseph M, Jegannathan S, Jayraj H. Validation of artificial intelligence algorithm in the detection and staging of diabetic retinopathy through fundus photography: An automated tool for detection and grading of diabetic retinopathy. Middle East Afr J Ophthalmol [serial online] 2021 [cited 2023 Feb 6 ];28:81-86
Available from: http://www.meajo.org/text.asp?2021/28/2/81/326667

Full Text


India has emerged as the diabetic capital of the world with currently over 40 million diabetics in the country.[1] Although several recent advances developed in the management of type 2 diabetes mellitus have significantly reduced the mortality, complications and comorbidities such as diabetic retinopathy (DR) have significantly risen in the recent years. DR continues to be one of the leading causes of vision loss globally. In addition to an increasing incidence, lack of awareness and poor access to early screening are the factors responsible for making DR a global epidemic.[1] The All India Ophthalmological Society Diabetic Retinopathy Eye Screening Study done in 2014 estimated the prevalence of DR to be 21.7% in Indian urban setting.[2] It is estimated that the number of people with DR will grow from 126.6 million in 2011 to 191.0 million by 2030.[3] Low- and middle-income countries face the biggest challenge from diabetes while in Africa, two-thirds of diabetics still remain undiagnosed.

All diabetic patients are recommended to undergo annual or more frequent retinal screening to enable early detection and intervention.[4] However, periodic retinal screening is often associated with various challenges ranging from paucity of ophthalmologists in remote and rural areas, accessibility to technology in resource-limited settings. Faced with this challenge, many innovative solutions have been proposed to build better access to DR screening, and one such innovation is the use of artificial intelligence (AI). AI can address the current barriers to early screening, i.e., scarcity of ophthalmologists and trained resources for reading fundus images, accessibility to screening in rural and low-resource areas.[4] Literature has showed several studies on incorporating AI in screening of DR, however, those studies were largely restricted to standard data sets, and in urban outpatient settings. Implementing innovations like AI at larger population settings like rural areas in real time are necessary to obliterate the feasibility challenges which prevail in delivering ophthalmic care.


This study was carried out to validate AI algorithm for the detection of DR and compare the findings with clinical evaluation of DR.


Study setting and participants

This study was carried out as a cross-sectional study by department of ophthalmology in the rural community eye clinic attached to our tertiary teaching institution for a period of 3 months between November 2017 and January 2018. Patients with clear view of the retina at least one eye were community considered for the study. Patients with bilateral cataracts were excluded.

Sample size and sampling technique

All the patients who presented to the rural center of the department in outskirts of Bengaluru for routine eye examination during the study period were selected for the study. A total of 156 patients were selected by convenience sampling during the course of this study. Out of this, 138 patients were selected for fundus imaging and 16 patients were excluded due to bilateral dense cataract obstructing clear view of retina; 2 patients were excluded due to age <18 years. In this study, 114 patients (82.6%) underwent pupillary dilatation and 24 patients (17.4%) did not undergo dilatation.

Ethical approval and consent

Institutional ethics committee approval was taken, and the study was conducted in accordance with the Declaration of Helsinki. Implied consent for fundoscopy and fundus photography was considered as the patients had presented themselves for routine ophthalmological assessment at the eye clinic.

Data collection

Fundus images were taken using Intucam Prime which is a low-cost portable fundus camera (mydriatic and nonmydriatic) that provides color fundus images with a 40° field of vision. Images were acquired by a technician with 6 months of experience in operating the device. For the purpose of this study, single field, posterior pole-centered fundus imaging was considered for DR screening based on imaging recommendations.[5],[6] The technician was allowed to take more than one image per eye to obtain the best possible quality wherever necessary.

For the purpose of this study, two images were selected from the images taken for each patient – one per eye, as perceived by the technician as being the best quality images. A total of 276 images of good photographic quality were selected for the study. Images for each patient were stored in a folder in a numerical sequence from 1 to 138.

The selected images were uploaded to a cloud platform for DR analysis, and the AI output regarding the presence or absence of DR (of any grade) along with referable DR (RDR) (indicating DR stage moderate nonproliferative diabetic retinopathy [NPDR] or worse, i.e., those that needed referral to a tertiary center) was derived.

The AI algorithm is a deep learning algorithm programmed to detect and stage DR, similar to other deep learning algorithms.[4] AI results were not communicated to the patient, and the patient's journey in the eye examination was not affected by the process.

The images were independently graded by four ophthalmologists using an online annotation tool, without access to the patient or clinical data. International Clinical Diabetic Retinopathy (ICDR) Classification Scale, [Table 1], ICDR classification],[7] was followed for disease severity staging as this helped the programmed algorithm to be able to classify a disease as referable or nonreferable (compared to other standard classification systems such as Early treatment of Diabetic Retinopathy Study (ETDRS)). Sample image classification with lesion markings is as shown in.{Table 1}

Operational definitions

RDR was defined as moderate NPDR or above and/or presence of macular edema. No DR and mild NPDR were considered as not referable. Sight-threatening DR was defined as severe NPDR and PDR or above. Not gradable: Suboptimal quality of images for DR grading.

As only one field (posterior pole) was considered for this study, severe NPDR was considered without application of the 4-2-1 rule. The presence of more than 20 intraretinal hemorrhages or venous beading or intraretinal microvascular abnormality in the single posterior pole centered image was considered to be severe NPDR.

The most frequent grade among the four ophthalmologists was considered as the ground truth for each image, in the event of a split decision where two ophthalmologists each agreed on a grade, the higher grade was considered as ground truth. There were no instances where each ophthalmologist had assigned a different grade for the same image. Ground truth grades thus derived were compared with the AIs output for presence, staging and referability of DR, based on the markings by AI algorithm [Figure 1].{Figure 1}

Interobserver reliability, ground truth, and artificial intelligence performance

The single field fundus images were graded independently by four ophthalmologists. The images were accessed through an online grading tool using a computer and were graded using the ICDR scale. The results were collated and compared between the ophthalmologists to assess the intergrader agreement and to derive the “ground truth” for each image, defined as the most common grade assigned for the image, or the consensus grading. During the process, the graders did not discuss the images to draw consensus or assent.

AI performance was then evaluated in comparison with the ground truth.

Statistical analysis

Data were analyzed using Microsoft Excel 2016 and MedCalc Statistical Software version 19.0.3 (Excel the freely available software by Microsoft. MedCalc Software bvba, Ostend, Belgium) Interobserver agreement was calculated for the following parameters using the intraclass correlation coefficient (ICC) between all four graders and k statistic between 2 graders. Classification of Cohen's kappa (k) was used as per guidelines[8] as <0.20 (poor), 0.21–0.40 (fair), 0.41–0.60 (moderate), 0.61–0.80 (good), and 0.81–1.00 (very good).

Classification of ICC was done using reference guideline as by Koo and Li[9] as follows: < 0.50 (poor); 0.50–0.75 (moderate); 0.75–0.90 (good); >0.90 (excellent).

Artificial intelligence performance in comparison with the ground truth

The images were processed through AI deep learning algorithm for 3 parameters:

Image quality assessment – gradable or not gradableDR detection – yes (ICDR severity mild or worse) or no (no DR)RDR detection– yes (ICDR severity moderate or worse) or no (no DR or mild NPDR).

The results were compared at the image level with the ground truth grades derived from the ophthalmologists grading data. For this, the sensitivity, specificity, positive predictive value, negative predictive value, and receiver operating characteristic were calculated for DR detection and RDR detection. AI has identified all 18 poor-quality images as “ungradable."


The study was carried out among 138 participants of which 84 (60.8%) were diabetic. The age and sex distribution of the study participants is given in [Table 2]. Of the 276 images from 138 patients, the agreement distribution between the four graders is as given in [Table 3]. Where all the four graders have assigned, the same grade is considered as 100%, three graders agreeing is considered 75%, and two graders agreeing is considered 50% agreement. There were no instances where all 4 graders assigned different grades to a given image.{Table 2}{Table 3}

In the 15 instances where there was a split but balanced agreement where different DR grades were assigned by two graders each, the higher grade of DR was considered as a ground truth DR grade for the image.

The ground truth grades/gold standard of ophthalmologist's evaluation

The prevalence of DR in this study was 18.84% seen among 21 participants out of the 138 evaluated. The prevalence of DR was represented in 44 images (15.94%) positive for signs of DR. Twenty patients were considered to have RDR (DR severity moderate or worse), and 18 images out of the 276 were considered as not gradable due to image quality falling short of providing diagnostic clarity [Table 4].{Table 4}

ICC between the four graders showed excellent consistency and the Cohen's Kappa compared agreement between 2 graders and showed “very good agreement” with an ICC score of 0.9174 [Table 5].{Table 5}

The average sensitivity and specificity among the graders in comparison with ground truth grades were 93.18% and 97.77% for any DR and 89.63% and 98.84% for RDR [Table 6]. Results showed that for any grade of DR, the sensitivity in detecting any grade of DR (mild or greater) is 100% while the specificity is 59.24% and area under curve (AUC) is 0.796.{Table 6}

The AUC for individual ophthalmologists varies from 0.90 to 0.97 [Table 7] and [Graph 1].{Table 7}[INLINE:1]

The artificial intelligence performance

The AI algorithm has falsely detected DR in 86 images which were otherwise classified as no DR under ground truth/clinicians observation. We identified most of these images as having artifacts, reflections, or innocuous pigmentary changes which the AI marked as cotton wool spots or dot hemorrhages.

Whereas, when we consider the RDR detection (grade of moderate NPDR or higher), AI has detected all images with RDR, therefore, sensitivity = 100% and specificity = 91.47% for RDR. The AUC = 0.957.

Thus, the AI has performed at par with the ophthalmologists in detecting RDR.


AI is finding varied applications in the field of medical science from screening for disease to assistance in robotic surgery. Screening for DR is the need of the hour – from rural, inaccessible areas to the elite urban health-care institute due to the overwhelming increase in the prevalence of comorbidities of diabetes mellitus. When used as a screening tool in a primary or tertiary care center, AI minimizes the dependence of manpower, especially the ophthalmologist, thereby limiting the patient load for each clinician and improving clinical productivity, in an already overwhelmed health-care system with a high patient to doctor ratio.

Screening for DR can easily be performed by a technician at rural areas such as that chosen in this study, or at the general physicians, endocrinologists, or nephrologists clinic. In a prototype rural area where this algorithm was validated, there is tremendous scope for its application, where tertiary medical care is less easily accessible and referral by means of a screening tool will be well received among the local and medical communities. It holds relevance in rural and underdeveloped tribal areas of developing nations where diabetes remains undetected due to lack of awareness and lack of accessible health care.

This AI model has been trained to detect and refer to all retinopathy which is classified as moderate NPDR or worse as per the ICDR standards. The high sensitivity of the AI (>95%) makes it a good screening tool for referable disease. It is trained to identify microaneurysms, hemorrhages, hard exudates, cotton wool spots, and neovascularization. The AI performs automated annotations of lesions and is an excellent tool for patient education and also for accurate monitoring of disease progress in follow-up visits. In nonretinopathy fundus and mild NPDR, however, the algorithm can overdiagnose or underdiagnose retinopathy as it can label artifacts and pigmentary changes as diabetic-related changes or miss out on an occasional microaneurysms that a trained clinical eye will detect. This is an area which needs to be improvised upon, but as mild NPDR qualifies as nonreferable according to our study, it does not affect the validation of the tool.

In conditions which mimic DR, or conditions with similar fundoscopic attributes (vein occlusion, wet age-related macular degeneration, light amplification by stimulated emission of radiation scars), the algorithm will annotate the lesions and hemorrhages and classify the patient as RDR, as it has not been trained to differentiate clinical conditions with similar findings based on the photographic lesions detected. Hence, the lower range of specificity (58%) due to the high false positives. Although this may appear to be a shortfall of the algorithm, the clinical implications are not of concern, as most of these conditions would require an ophthalmic evaluation nonetheless.


The benefits of using AI in medical science are manifold, from rural outreach screening tool to robotic surgery at tertiary care. Although the present study does not suggest that the gold standard ability of an ophthalmologists examination should be replaced or substituted, it is recommended that AI and its application should be used as a valuable screening tool and clinical aid to screen large populations and to help in the early detection, referral, and treatment of DR with the goal to limit morbidity due to the disease. Since there has been no evidence in the existing literature regarding the validation of AI software in a real-time setting, the present study serves as a pilot study for the purpose of validation of the same.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.


1Mohan V, Sandeep S, Deepa R, Shah B, Varghese C. Epidemiology of type 2 diabetes: Indian scenario. Indian J Med Res 2007;125:217-30.
2Gadkari SS, Maskati QB, Nayak BK. Prevalence of diabetic retinopathy in India: The All India Ophthalmological Society Diabetic Retinopathy Eye Screening Study 2014. Indian J Ophthalmol 2016;64:38-44.
3Zheng Y, He M, Congdon N. The worldwide epidemic of diabetic retinopathy. Indian J Ophthalmol 2012;60:428-31.
4Ting DS, Cheung CY, Lim G, Tan GS, Quang ND, Gan A, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 2017;318:2211-23.
5Williams GA, Scott IU, Haller JA, Maguire AM, Marcus D, McDonald HR: Single-Field Fundus Photography for Diabetic Retinopathy Screening Ophthalmology 2004;111:1055–62.
6Solanki K, Ramachandra C, Bhat S, Bhaskaranand M, Nittala MG, Sadda SR. EyeArt: Automated, high-throughput, image analysis for diabetic retinopathy screening. Invest Ophthalmol Vis Sci 2015;56:1429.
7International Clinical Diabetic Retinopathy Disease Severity Scale, Detailed Table Authored by American Academy of Ophthalmology Posted on, 2010.
8Meta-analysis of Cohen's kappa. Health Serv Outcomes Res Methodol 2011;11:Shuyan Sun : Health Serv Outcomes Res Method 2011;11:145–63.
9Koo T, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 2016;15:155-63.