Introduction

Coronary artery disease, particularly acute coronary syndrome (ACS), is responsible for approximately one-third of all deaths in adults over 35. Nowadays percutaneous coronary intervention (PCI) is the most widely used treatment for ACS. Acute kidney injury (AKI) is a serious non-cardiovascular complication in patients with ACS, and nearly 12.8% of the patients develop AKI as a major post-PCI complication with a 20.2% attributed mortality rate during or after hospitalization [1, 2]. A growing body of evidence indicates that AKI is significantly associated with an increased risk of long-term morbidities such as repeated coronary revascularization, myocardial infarction, and stroke [3, 4].

To prevent contrast induced-AKI (CI-AKI), physicians can implement preventive measures such as regulating contrast volume and osmolarity, pre-procedural statin intake, and pre- and post-procedural hydration [1, 5]. Identifying PCI-related patient risks allows physicians to tailor strategies based on each individual’s risk profile, leading to fewer complications and improved clinical outcomes after a PCI procedure [1, 6]. Prediction models, such as the NCDR-AKI risk model, have been developed to assess the risk of CI-AKI prior to performing PCI with a c-statistics of 0.71 [7]. Traditional statistical models may not include all possible interactions when there are numerous candidate variables, resulting in a decrease in the model’s accuracy when these interactions are ignored [1, 8]. Machine Learning (ML)-based models do not depend on assumptions about the variables involved or their relationship with the outcome. Instead, they capture complex relationships in a data-driven manner, including nonlinearity and interactions that may be difficult to identify otherwise. These models have been used for the prediction of outcomes in cardiovascular medicine [9,10,11].

This study aims to evaluate novel ML-based models to more accurately predict the risk of PCI-induced AKI in ACS patients and subsequently reduce the risk of long-term complications. The efficacy of ML-based models will be compared with traditional stepwise selection models, and the study will investigate whether machine learning-based models can sufficiently reduce the variables needed for disease prognosis prediction.

Methods

Study design

We retrospectively reviewed all patients with ACS [ST-elevation myocardial infarction (STEMI), non-STEMI, and unstable angina (UA)] who underwent PCI at Tehran Heart Center between 2015 and 2020. The ethics committee of Tehran Heart Center approved this study (IR.TUMS.MEDICINE.REC.1402.178). The informed consent was waived due to the retrospective design of this study.

Variable’s definition and outcome

Pre-procedural variables used were: gender, age, left ventricular ejection fraction (LVEF), atrial fibrillation (AF), fasting plasma glucose (FPG), triglycerides (TG), total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), drug history (lipid-lowering, anti-diabetes, anti-hypertension, anti-arrhythmia, and anti-thrombotic), hematocrit, body mass index (BMI), estimated glomerular filtration rate (eGFR), creatinine (Cr), type of diabetes management, past medical histories (cardiac, renal, previous PCI, previous CABG), and CAD risk factors.

Procedural variables were: non-ST elevation myocardial infarction (NSTEMI) in coronary angiography (CAG), acute MI in CAG, treated vessel, procedure result, stenosis, stent diameter, stent length, stent inflation pressure, post-procedural complications (arrhythmia, cardiopulmonary resuscitation (CPR), aborted cardiac arrest, and procedure-induced shock).

AKI, the primary outcome in this study, was defined based on the acute kidney injury necrosis (AKIN) as an absolute increase of ≥ 0.3 mg/dL or a relative increase of ≥ 50% in serum creatinine after the procedure [12].

Data cleaning

At first, patients with missing data for follow-up were removed and missing data for other features were handled through imputation with median values for numerical features and mode for categorical ones. Notably, features with more than 40% missing data were removed from the models. Then, the patients with end-stage renal disease (ESRD) (eGFR < 15 mL/min) were excluded. Moreover, we excluded individuals with implausible creatinine values (Cr < 0.3 mg/dL or Cr > 4.0 mg/dL). Label encoder (from the scikit-learn library) was used to change categorical variables into numerical variables.

Train/test split and feature selection

We randomly assigned each patient to the train (80%) or test (20%) dataset using stratified splitting. Five-fold cross-validation was used in this study for feature selection and hyperparameter tuning. To find the most important variables among the vast number of procedural and post-procedural features, and to reduce the complexity of our models, we first trained an RF model on these features from our training dataset as our feature selector. We selected the top 15 features based on the feature importance given by this model. This cutoff was defined as we wanted to use features, sum of which contributed to 80% of the total feature importance. The selected features are used as our procedural features to train the main models in this study. Moreover, SHapley Additive ExPlanations (SHAP), as a game-based feature analysis technique [38].

In a study, Lasso and SHAP methods in ML selected that ST-elevation MI, eGFR, age, preprocedural hemoglobin, non-ST-elevation MI/unstable angina, heart failure at admission, and cardiogenic shock as the pertinent predictor for AKI risk after PCI [8]. On the other hand, Ma et al. reported 11 important predictors of CI-nephropathy after PCI, including uric acid, peripheral vascular disease, cystatin C, creatine kinase-MB, hemoglobin, N-terminal pro-brain natriuretic peptide, age, diabetes, systemic immune-inflammatory index, total protein, and low-density lipoprotein, using SHAP method [28]. Also, age, serum creatinine level, and LVEF were among the top 20 ranked important variables concerning CI-AKI risk stratification after acute MI, using the Boruta ML algorithm [1].

Given the potential importance of AKI as an adverse event after PCI, models such as the ones investigated in this study can have clinical applications in the prediction of AKI post-PCI in patients with ACS, after further confirmation in larger studies. With implementing easy-to-use variables both pre-procedural and procedural, these ML-based models provided acceptable predictions. Our models showed similar prediction ability between models with and without procedural variables. It is of importance since intra-procedural features are dependent on the skill of the team performing PCI which makes it subjective and, hence, makes the inherent risk of patients less highlighted [18, 39]. Individualized risk stratification in predicting PCI can lead to better prevention of AKI after PCI. LVEF, age, and FPG were the main predictors of AKI which are easy to measure in patients with ACS admitted to PCI units. Clinicians could take advantage of these models for the prediction of AKI and therefore, provide better care for those at higher risk. These kinds of models could be used regionally or even internationally when assessed in different settings and on different populations.

Several limitations to our research need to be mentioned. Firstly, the single-center nature of our study could affect our findings. Furthermore, it is essential to consider the potential impact of not incorporating confounding variables. It is also important to note that electrocardiogram data and follow-up laboratory data were not available in this databank. Another limitation of our study was missing data that we handled by replacing with median in continuous variables and with mode in categorical ones, which might not have been the optimal way for doing so; however, the prediction of missing data was not possible due to not having a large enough dataset. Moreover, since we tuned the threshold for classifying the groups to optimize sensitivity (recall), we were not able to assess the calibration of our models, and the probabilities in models were only used to identify the optimal threshold. Also, the fact that our data were imbalanced and we tuned our models for better prediction of AKI based on AUC using five-fold cross-validation, led to relatively lower specificities, compared to AUCs and sensitivities. This is a limitation of our study; however, it should be considered that in these types of adverse events, higher sensitivity is much favored over higher specificity since the clinician’s aim is to not miss any potentially high-risk case in terms of AKI. Also, in our study, the threshold was adjusted for higher sensitivity while in other clinical settings, it could be tuned for higher specificity based on clinical settings. Finally, despite using fivefold cross-validation in our training cohort and evaluating the models on an unseen test cohort, the lack of external validation in our study might threaten the generalizability of our findings and models.

Conclusion

In conclusion, the ML models such as RF, LR, CB, MLP, and NB algorithms, showed an acceptable predictive performance for the risk of AKI following PCI, with RF and CB providing the greatest discriminations. Also, the most important features for the AKI prediction were detected, and LVEF demonstrated the largest coefficient in all predicting models. Therefore, it could be suggested that ML models, particularly the RF model, improve the accuracy of AKI prediction in patients undergoing PCI, which has significant implications for clinical decision-making and management to prevent AKI incidence. However, further studies are necessitated to validate the findings of the present study.