Background

Acute kidney injury (AKI) is associated with a higher risk of chronic kidney disease (CKD), end-stage renal disease (ESRD), and long-term adverse cardiovascular effects [1, 2]. Due to the lack of effective treatment for impaired kidney function, the best strategy in clinical practice is to identify AKI as early as possible, reverse its cause, and even improve the sequelae. In the past decades, several serum creatinine (SCr)-based classification systems have been proposed to define AKI [3]. Serum creatinine has traditionally served as a surrogate of kidney function, despite its limitations as a diagnostic surrogate of AKI [4]. The limitations of SCr include a lack of steady-state conditions in critically ill patients, and that the determinants of SCr (rate of production, apparent volume of distribution, and rate of elimination) are variable. Therefore, there is an unmet need for other objective measures to help detect AKI in a timely manner. The role of several biomarkers in the early prediction or risk assessment of AKI has been proposed, including kidney tubular damage markers (e.g., neutrophil gelatinase-associated lipocalin (NGAL), kidney injury molecule-1 (KIM-1), liver-type fatty acid-binding protein (L-FABP)) [5,6,7,8,9], inflammation markers (e.g., interleukin-18 (IL-18)) [6, 10, 11], and stress markers (e.g., tissue inhibitor of metalloproteinases-2 and insulin-like growth factor-binding protein-7 (TIMP-2 ×  IGFBP-7)). The ADQI expert group suggests that routine clinical assessments should be combined with stress, damage, and functional biomarkers to stratify risk, discriminate etiologies, assess severity, plan management, and predict the duration and recovery of AKI [12]. In addition, previous meta-analyses including patients with various clinical scenarios have suggested that these biomarkers hold promise as practical tools in the early prediction of AKI [5, 13,14,15,16,17]. However, few studies have compared the diagnostic accuracy of these AKI biomarkers, and systematic assessments of the quality of evidence, which can provide updated information for clinical guidelines, are lacking. Therefore, the aim of this study was to compare the reported predictive accuracy of AKI biomarkers in various clinical settings and appraise the quality of evidence using a pairwise meta-analysis. The findings of this study may be used to update guidelines and recommendations.

Methods

Search strategy and selection criteria

We conducted this pairwise meta-analysis according to the Preferred Reporting Items of Systematic Reviews and Meta-Analyses (PRISMA) statement [18] and used Cochrane methods [19]. We prospectively submitted the systematic review protocol for registration on PROSPERO [CRD42020207883].

Data sources and search strategy

The primary outcome was incident AKI. Electronic searches were performed on PubMed (Ovid), Medline, Embase, and Cochrane library from inception to August 15, 2022 (Additional file 1: Appendix). We screened references by titles and abstracts and included related studies for further analysis. Reference lists of related studies, systematic reviews, and meta-analyses were manually examined to identify any possible publications relevant to our analysis. Both abstracts and full papers were selected for quality assessment and data synthesis.

Inclusion and exclusion criteria

The inclusion criteria were as follows: (1) clinical studies that included participants over 18 years of age and of any ethnic origin or sex; (2) studies that reported candidate AKI biomarkers including NGAL, KIM-1, L-FABP, IL-18, and TIMP-2 × IGFBP-7; and (3) studies that assessed the occurrence of incident AKI. The exclusion criteria were as follows: (1) studies including patients who had previously received dialysis; (2) studies including pregnant or lactating patients; (3) letters, conference or case reports; and (4) studies that lacked data on sensitivity or specificity of biomarkers to predict the occurrence of AKI. Only regular full papers were selected for quality assessment and data synthesis. We contacted the authors of abstracts for further detailed information, if available.

Study selection and data extraction

Six investigators (Heng-Chih Pan, Terry Ting-Yu Chiou, Chih-Chung Shiao, Che-Hsiung Wu, Hugo You-Hsien Lin, and Ming-Jen Chan) independently reviewed the search results and identified eligible studies. Any resulting discrepancies were resolved by discussion with a seventh investigator (Vin-Cent Wu). All relevant data were independently extracted from the included studies by eight investigators (Heng-Chih Pan, Chih-Chung Shiao, Terry Ting-Yu Chiou, Yih-Ting Chen, Chun-Te Huang, Ya-Fei Yang, Shu-Chen Yu, and Zi-Ming Chen) according to a standardized form. Extracted data included study characteristics (lead author, publication year, population setting, biomarkers, study endpoint, sample size, events, timing of measurements) and participants’ baseline data (mean age (years), gender (%), comorbidities, severity of illness). When available, odds ratios and 95% confidence intervals (CIs) from cohort or case-controlled studies were extracted. Other a priori determined parameters included the type of intensive care unit (ICU) setting (surgical/mixed or medical), criteria used to diagnose AKI and severe AKI, cohort size, and the presence of sepsis. Any disagreements were resolved by discussion with the investigators (Heng-Chih Pan and Vin-Cent Wu).

Quality assessment

The Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool was used to assess the quality of each included study [20, 21]. The following 4 domains were assessed: patient selection, index test, reference standard, and flow and timing. Any disagreements in the quality assessment were resolved by discussion and consensus [15].

Pre-specified subgroup analysis

We hypothesized that the following factors could have high impacts on patient outcomes observed among different studies: clinical setting (ICU/non-ICU), patient population (surgical versus mixed/medical), whether the studies only included patients with sepsis or not and different AKI criteria (risk, injury, failure, loss, ESRD (RIFLE); Acute Kidney Injury Network (AKIN); Kidney Disease: Improving Global Outcomes (KDIGO)).

Data synthesis and statistical analysis

A 2 by 2 table reporting the patient number of true positive, false positive, true negative, and false negative findings for the cutoff point given by the included studies was used to generate sensitivity, specificity, and diagnostic odds ratio (DOR) for each study. The sensitivity, specificity, and DOR for all of the included studies were combined using a bivariate model. DOR was defined as the endpoint of primary interest in this study because it combines the strengths of sensitivity and specificity with the advantage of accuracy as a single indicator [22]. The sensitivity and specificity were defined as the endpoints of secondary interest in the study. The diagnostic performance for AKI among the 12 different biomarkers was compared using a bivariate model in which the type of biomarker was treated as a categorical covariate. Hierarchical summary receiver operating characteristic curves (HSROCs), which consider the threshold effect [23], were used to illustrate the overall diagnostic performance for each biomarker. The analysis was further stratified by the following pre-specified subgroups: surgical versus mixed/medical patients, ICU/non-ICU patients, sepsis/non-sepsis patients, and different AKI criteria (RIFLE/AKIN/KDIGO). In the subgroup analysis, biomarkers only reported in 1 study could not be compared and were therefore excluded. Potential publication bias was assessed visually using funnel plots. A two-sided P value < 0.05 was considered statistically significant. The bivariate model was conducted using SAS version 9.4 (SAS Institute, Cary, NC) with the “METADAS” macro (version 1.3) which is recommended by the Cochrane Diagnostic Test Accuracy Working Group. The HSROC analysis and funnel plots were performed using R software version 3.6.3 with the “meta4diag” package (version 2.0.8) based on Bayesian inference.

Results

Search results and study characteristics

The study selection process is summarized in Additional file 1: Appendix. A total of 23,882 articles were identified through the electronic search, and after excluding duplicate and non-relevant articles, the titles and abstracts of the remaining 1803 articles were screened. A total of 242 studies were eligible for full-text review, of which 110 studies including 38,725 patients reported data on the occurrence of AKI with any one of the biomarkers of interest and were included in the meta-analysis [24,25,26,27,28,140]. In critically ill or surgical patients, the potential benefits of reducing kidney injury-related complications may outweigh the loss caused by over-monitoring the patient, such as related length of stay. Appropriate biomarkers should improve the detection rate of AKI with high sensitivity and good negative predictive value, thus enabling timely initiation of preventive strategies for AKI [141]. Previous investigations have reported that TIMP-2 × IGFBP-7 was a good biomarker to identify patients who will develop AKI and reduce the need for renal replacement therapy [136, 137, 142]. As demonstrated in the present study, NGAL/Cr, L-FABP/Cr, and TIMP-2 × IGFBP-7: custom seemed to have good predictive performance in the setting of critically ill patients, while NGAL/Cr and KIM-1 were the best biomarkers in surgical patients (Tables 4, 5).

In non-critically ill or medical patients, patient stratification for the risk of AKI should be applied to the entire hospital population before any scheduled elective intervention. In order to minimize unnecessary impacts due to these scheduled treatments, the specificity should outweigh the sensitivity [141]. In our study, the clinical performance of TIMP-2 × IGFBP-7 with a cutoff value of 2 was significantly better than that of TIMP-2 × IGFBP-7 with a cutoff value of 0.3 in the medical patients. Urinary NGAL, KIM-1, and serum NGAL seemed to be the best biomarkers in the setting of non-critically ill patients and medical patients (Tables 4, 5).

However, the sensitivity and specificity in the enrolled studies were heterogeneous because they depended on the circumstances and the threshold effects of the biomarkers. Considering the potential threshold effects and the correlation between sensitivity and specificity, HSROC analysis proved the good predictive performance of L-FABP/Cr and the NGAL series (Fig. 1A). There were differences in the applied diagnostic criteria for AKI between the enrolled studies. The subgroup analysis also demonstrated that the relative diagnostic accuracy of the AKI biomarkers remained consistent in the studies using current standard AKI criteria (RIFLE/AKIN/KDIGO) (Table 6). NGAL series seemed to have the best predictive performance for AKI, especially in the high-quality studies and in the studies which were conducted in high-income countries. Other biomarkers outperformed the NGAL series only in low- or moderate-quality studies or in the studies conducted in middle- or low-income countries (Additional file 1: Tables S2-S3). Sensitivity analysis also demonstrated the good predictive performance of serum NGAL, urinary NGAL, and TIMP-2 × IGFBP-7: custom for early onset AKI (AKI developed within 48 h) and severe AKI (stage 2–3 or renal replacement therapy) (Additional file 1: Tables S4-S6). These findings enhance the robustness of the study results.

Although the damage and stress biomarkers in this study had good predictive performance, unlike troponin in acute coronary syndrome, none of the reported biomarkers are completely specific for AKI. Previous studies have reported that NGAL, IL-18, and KIM-1 may be elevated in the setting of sepsis and CKD [143,144,145,146]. Of note, these biomarkers can be used to recruit more homogenous patient populations when implementing a clinical trial [147]. Biomarkers to identify and characterize AKI sub-types are necessary and may have the potential to provide individualized timely etiology-based management of AKI. In addition, considering the complex and multifactorial etiology of AKI, a panel of multiple biomarkers including stress, injury, and kidney reserve biomarkers could provide better discrimination for AKI. Furthermore, more kidney tissue-specific markers may help localize and quantify the severity of AKI and provide a deeper understanding of the pathophysiology of AKI. These biomarkers may offer opportunities for personalized management of AKI and support the call for a refinement of the existing AKI criteria.

Strengths and limitations

The strength of our analysis is the extensive literature search of related studies. We used standard Cochrane protocols and included the largest cumulative study sample size to date in comparison with previous reports. The strength of our meta-analysis also lies in the comprehensive data search with subgroup analyses across several clinical scenarios. We used the GRADE approach to rate the certainty of evidence [148].

Besides limitations in the meta-analysis, there were several limitations in the individual studies. First, most studies had a small sample size, and this contributed to the high heterogeneity of the meta-analysis. Second, our funnel meta-regression and Cochrane Collaboration tool analysis showed significant publication bias (Additional file 1: Appendix). Third, in some scenarios, the limited number of enrolled studies, such as trials focusing on sepsis, made subgroup analysis difficult. Of note, these new biomarkers are most effective in conditions where the time of renal insult is known, for instance, post-cardiac surgery or coronary angiography, compared to situations where the onset of kidney injury is less clear, for instance, in sepsis. To ensure the robustness of the findings, we did not emphasize the diagnostic accuracy of biomarkers extracted from fewer than three articles. Fourth, we did not perform additional analyses to assess the additional predictive value of SCr levels. Most of the included studies did not measure SCr levels with biomarkers to predict AKI. In the literature, SCr has poor predictive performance for AKI due to delayed rise and cannot accurately estimate the timing of injury [118, 127]. Traditionally, the diagnosis of AKI is based on a rise in serum creatinine and the creatinine could be hard to wear two hats, having an administrative role as well as patrolling the beat. Furthermore, the use of SCr as a comparison has several limitations and limits the full interpretation of biomarker performance. For example, SCr may be elevated in pre-renal azotemia, which is not true for renal tissue damage, and biomarkers may not be elevated. On the other hand, in the setting of true renal injury with fluid overload, biomarkers may be elevated but SCr may remain unchanged, which may underestimate the predictive performance of biomarkers [149, 150]. Fifth, the kits for specific biomarker analysis varies among the studies, so it was difficult to determine the optimal cutoff value of biomarkers to predict AKI. Sixth, the occurrence of AKI was diagnosed according to several different criteria in the enrolled studies. However, the KDIGO classification was the mostly commonly used, which has been proposed to provide a uniform definition of AKI, essentially combining the RIFLE and AKIN criteria. Finally, the definition of AKI varied between the studies, and this may have unduly influenced pooled effect estimates. Nonetheless, our conclusions were drawn from studies with different study designs and different clinical scenarios. Further research efforts are certainly needed for the pursuit of better precision medicine, especially with regard to the use of multiple biomarkers. It could be more fruitful to investigate whether different etiologies of AKI (pre-renal versus renal versus obstructive, cardiogenic shock, hypovolemic shock, sepsis-related, etc.) affect the predictive accuracy of biomarkers, and to evaluate whether the efficacy of biomarkers is affected by the severity of AKI. These issues can be incorporated into the design of future randomized controlled trials to evaluate the optimal biomarkers for different clinical settings in order to improve the timely diagnosis of AKI. Moreover, further investigations to improve the diagnosis and manage the underlying mechanisms of AKI may help to mitigate the current high mortality rate of patients with AKI.

Conclusion

Based on our pairwise meta-analysis of biomarkers to predict AKI, NGAL series had the best diagnostic accuracy for the prediction of AKI, regardless of whether or not they were adjusted by urinary creatinine, especially in medical patients. However, the predictive performance of urinary NGAL was limited in surgical patients, and NGAL/Cr seemed to be the best biomarkers in these patients. All of the biomarkers had similar predictive performance in critically ill patients. Future pragmatic clinical trials are warranted to evaluate the real-world predictive accuracy of AKI biomarkers.