Introduction

Chronic kidney disease (CKD) is a leading cause of cardiovascular disease and non-communicable disease mortality1,2,3. CKD prevalence is growing rapidly due to an aging global population and increased prevalence of hypertension and diabetes, two major causes of CKD4,28,29,30. In addition, even in cases of localized kidney diseases, early-stage nephron loss may induce changes in the systemic milieu which could lead to retinal vascular pathology. Investigations showing that macrophage activation is promoted, even in early kidney disease stages, which induces systemic cholesterol accumulation through cholesterol efflux alteration support this possibility31.

Fourth, to the best of our knowledge, there are two prior studies related to the current topic. Sabanayagam et al. developed a retina-based deep-learning algorithm from a cross-sectional study that could diagnose CKD, but the prediction of future CKD events was not evaluated17. Further, Zhang et al. also proposed a deep-learning algorithm to predict CKD32. However, the total number of CKD cases was low in their validations, with a maximum of 6 years of follow-up (80 cases in the internal and 66 cases in the external test set). Moreover, risk stratification, in that study, was less apparent. Notably, in our exploratory analysis, the predictive power of the Reti-CKD score was >99% for both the UK Biobank and Korean Diabetic Cohort, a significant improvement over previous investigations32.

Interestingly, in the subgroup analyses, the impact of Reti-CKD predicting CKD was relatively greater among patients without diabetes than diabetic patients. Several explanations for this finding would be possible. First, more patients would be presenting CKD related retinal photograph findings among diabetic patients due to diabetic retinopathy, while retinal changes would be a comparably less common feature in non-diabetic patients33. This disproportion of retinal abnormality among groups may have resulted in a greater CKD prediction power for Reti-CKD in people without diabetes. Abnormal retinal changes in a group in which most people present normal retinal features may indicate a higher likelihood of concomitant kidney disease than those without retinal pathology. Second, in diabetes patients, other risk factors than retinal abnormalities may serve as powerful surrogates for kidney function status, such as blood glucose level or diabetes duration34,35. The presence of these other factors could have resulted in a relatively lower predictive capability of the Reti-CKD score. Nonetheless, although the predictive impact of Reti-CKD may slightly differ regarding the presence of diabetes, it should be noted that, in both patient populations, the effectiveness of Reti-CKD as a predictive marker was significant.

The major strength of this study is the development of a deep-learning algorithm using separate datasets from a Korean health screening center and validation of a new CKD risk score using two different cohorts (internal validation in the UK Biobank and external validation in the Korean Diabetic Cohort) with a sufficient number of CKD incidence. However, this study had several limitations. First, the primary outcome in the UK Biobank was defined according to claim codes from inpatient and general practice claim records. Due to human error and subjectivity, chances of CKD development being misdiagnosed in the claims data should be considered. Nonetheless, differences in CKD incidence across the four Reti-CKD score groups were significant lowering this possibility. This is because misdiagnosed cases would have been randomly distributed across the four risk groups. Additionally, sensitivity analyses also supported the association between the Reti-CKD score and CKD incidence, further reducing the likelihood that bias played a major role. Second, external validation had only been performed on the Korean Diabetic Cohort. There remains a need for further validating Reti-CKD scores in various disease populations and across different ethnicities.

In conclusion, we derived and validated a non-invasive CKD risk stratification tool, Reti-CKD score. For people with preserved kidney function, Reti-CKD score was more effective in the prediction of CKD incidence than conventional blood test-based method. Since access to retinal photography is increasing at community and primary care levels, Reti-CKD score has the potential to be adopted as a practical screening tool for primary CKD prevention.

Methods

Ethics statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the institutional review board of Severance Hospital, Yonsei University Health System (4-2021-1174). Owing to the retrospective nature and use of de-identified data, the requirement for informed consent was waived for the Korean datasets. Written informed consent was obtained from the UK Biobank participants. UK Biobank resources were used under the application number 68428.

Study population

In phase 1, a health screening center data were used for the development of deep-learning algorithm (Fig. 3). A development set including 158,216 retinal photographs was used to train the deep-learning algorithm. These retinal photographs were from 79,108 adults who had participated in health screening programs at a health screening center affiliated with Severance Hospital, South Korea.

Fig. 3: Study flow chart for derivation and validation of Reti-CKD score.
figure 3

In phase 1, health-screening center data was used for the development of deep-learning algorithm. In phase 2, data from longitudinal cohorts were utilized for derivation and validation of Reti-CKD score. Reti-CKD score was derived based on a Cox model using the UK Biobank cohort. The performance of Reti-CKD score was subsequently validated using the UK Biobank and Korean Diabetic Cohort.

In phase 2, data from longitudinal cohorts were used to derive and validate the Reti-CKD score. Reti-CKD score was derived using the UK Biobank cohort (n = 30,477). The performance of Reti-CKD score was subsequently validated using the UK Biobank and Korean Diabetic Cohort (n = 5014). The UK Biobank is a community-based prospective longitudinal study. The Korean Diabetic Cohort contains clinical data from a cohort of patients with type 2 diabetes who were treated at Severance Hospital or Gangnam Severance Hospital from April 2011 to August 2018.

Participants were eligible for this validation if clinical information for calculating eGFR and retinal photographs were available. Accordingly, participants with prevalent CKD, with eGFR <90 mL/min/1.73 m2, or with albuminuria (defined as urine albumin to creatinine ratio >30 mg/Cr in the UK Biobank and albumin ≥trace level on dip-stick urinalysis in the Korean Diabetic Cohort) were excluded. A detailed flowchart of the study population is shown in Supplementary Fig. 2.

Retinal photography

In phase 1, retinal photographs were taken using three different retinal cameras in the development set: the AFP-210 nonmydriatic auto retinal camera (NIDEK Corporation, Aichi, Japan), TRC-NW8 nonmydriatic retinal camera (Topcon Corporation, Tokyo, Japan), and Nonmyd A-D (Kowa Co. Ltd., Shizuoka, Japan). In the development set, retinal photographs of all participants were taken, and blood tests were performed on the same day.

In phase 2, retinal photographs were taken at baseline using TRC-NW8 (Topcon Corporation, Tokyo, Japan) and KOWA VX-20 (Kowa, Chofu Factory, Japan) in the Korean Diabetic Cohort and Topcon 3D OCT-1000 Mark II (Topcon Corporation, Tokyo, Japan) in the UK Biobank.

Development of deep-learning algorithm using retinal photographs: phase 1

In phase 1, contemplating that retinal microvascular signature associated with CKD could determine future risk of CKD, a deep-learning algorithm was trained (Fig. 3). The model inputs were retinal photographs. Ground truth was “absence versus presence” of CKD; CKD was defined as eGFR <60 mL/min/1.73 m2 or albuminuria and coded as a binary variable (no CKD versus CKD).

The utilized deep learning model was based on the ConvNeXT model. During the model training process, single images of each eye were separately inputted with a corresponding label. In the evaluation process, each image of the left and right eye had been assigned a probability score, and the average of these probability scores were considered as output of the examination, a process similar to ensemble learning with multiple models. This process showed improved performance than evaluating one image at a time each with a probability score. Our model design was almost identical to ConvNeXT, except for the dimension of the last fully connected layer, which was changed to one logit probability prediction. The logit that resulted from the last fully connected layer was converted to a probability with a sigmoid function, and we trained this model to minimize losses from the target and prediction. Further, we trained our model using the AdamW optimizer with a 0.0002 learning rate and cosine learning rate schedule for 50 epochs (Supplementary Fig. 3). For data augmentation, we used mixup, CutMix, RandAugment, contrast enhancement module, and random crop. Moreover, we not only adopted focal loss and exponential moving average but also used 384 × 384 size images. The cross-sectional performances of the deep learning algorithm’s prediction score in an internal validation set is provided in Supplementary Table 9. Once the deep-learning algorithm was trained, the probability of CKD presence could be calculated. This probability ranged from zero to one, with a high value indicating a high probability for having CKD. This “deep-learning-derived retina-CKD probability” was designed to calibrate the amount of association between the retinal microvascular signs and the presence of CKD. This deep-learning-derived retina-CKD probability alone without other clinical factors was tested to be capable of stratifying future CKD risk using two longitudinal cohorts of the UK Biobank and Korean Diabetic Cohort (Supplementary Table 10). Further, the deep-learning-derived retina-CKD probability was evaluated using Harrell’s c-statistic to show an incremental value over the eGFR model for CKD prediction (Supplementary Table 11).

Development and validation of Reti-CKD score: phase 2

In phase 2, the predictability of the deep-learning-derived retina-CKD probability was further enhanced by integrating clinical factors (Fig. 3). The Reti-CKD score was derived using the Cox proportional hazards model in the UK Biobank. After fitting the Cox proportional hazards model to the UK biobank cohort, we obtained the coefficients of the covariates. The baseline survival probability in the UK biobank at 5 years was 0.9980896. Reti-CKD score was modeled to be a failure probability at 5 years. The Cox model included age, sex, hypertension, diabetes, and deep-learning-derived retina-CKD probability. The clinical factors were chosen to derive a parsimonious model because they can be obtained from questionnaires without additional invasive measures, such as blood tests (Supplementary Table 12).

For Reti-CKD risk score validation, four-tier CKD risk groups were proposed based on Reti-CKD score quartiles (1st, 2nd, 3rd, and 4th quartile) in each cohort of the UK Biobank and Korean Diabetic Cohort. A conventional eGFR-based CKD risk score (i.e., eGFR-CKD score) was also derived using the Cox proportional hazards model in the UK Biobank for comparison (Supplementary Table 12). The performance of Reti-CKD and eGFR-CKD scores in prediction of CKD events were assessed in the UK Biobank and Korean Diabetic Cohort, respectively.

CKD incidence definition

In the UK Biobank, CKD incidence was defined according to the tenth revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10) codes and the Office of Population Censuses and Surveys Classification of Interventions and Procedures version 4 (OPCS-4) codes in primary care settings, hospital inpatient data, and death register records (Supplementary Table 13) The follow-up period began at the date of the first assessment and ended with death, CKD diagnosis, or the end of follow-up, whichever occurred first.

In the Korean Diabetic Cohort, CKD incidence was defined as two consecutive eGFR values of <60 mL/min/1.73 m2, the first of which was used as the index date. The follow-up period for each patient began when their retinal photographs were taken and ended on the date of CKD diagnosis or the last creatinine measurement.

Definition of covariates

The eGFR was calculated from serum creatinine using the Chronic Kidney Disease Epidemiology Collaboration equation (CKD-EPI)36. In the UK Biobank, diabetic history was determined by ICD-10 codes in any primary care setting and hospital inpatient data. Hypertension was defined as the use of antihypertensive medications in any primary care setting, hospital inpatient data, and self-reported medical history records. Similarly, in the Korean Diabetic Cohort, hypertension was established for patients on antihypertensive medication according to outpatient and inpatient prescription data.

Saliency maps

To explain how the deep learning model works, saliency maps were generated. We used guided backpropagation, which uses gradients of class probability for each image pixel, to demonstrate how pixels can affect the prediction results of the model37. Further, to obtain a more robust and clear visualization, we used the SmoothGrad technique, which averages gradients from images with random noise38.

Statistical analysis

Python 3.7 was used for development of the deep-learning algorithm, and Stata version 16.1 (Stata Corp, TX, USA) and R version 5.0.3 (R Foundation, Vienna, Austria) were used for survival analysis and model performance assessment. Statistical significance was set at P < 0.05. Descriptive statistics were provided for all datasets including health screening data, the UK Biobank, and the Korean Diabetic Cohort.

In the UK Biobank, each participant was followed up to 11.6 years from the date of the initial visit to the last follow-up date (February 28, 2021) or the date of CKD diagnosis. In the Korean Diabetic Cohort, each patient was followed up to 14.0 years from the date of the initial visit to the last follow-up date (February 28, 2022) or the date of CKD diagnosis. The cumulative incidence of CKD was evaluated across the quartiles defined by the Reti-CKD score using the Kaplan–Meier method and Cox proportional hazards model to estimate HRs. The eGFR-adjusted model included the baseline eGFR as a covariate.

The prognostic value of the Reti-CKD and eGFR-CKD scores in predicting CKD incidence was assessed using Harrell’s C-statistic and NRI39,40. Further, to obtain 95% CIs, we used a non-parametric bootstrap procedure with 1000 samples.

For sensitivity analyses, we additionally evaluated the performance of Reti-CKD in the entire population including prevalent CKD. Second, we repeated our survival analysis with participants with and without underlying diabetes or hypertension to evaluate predictability among different CKD etiologies. Third, analysis was done with participants identified as Caucasians in the UK Biobank. Fourth, landmark analysis was conducted using both cohorts after excluding subjects with a follow-up period of <1 year. Finally, analysis was done with eGFR converted form the CKD-EPI creatinine-cystatin equation41.