Introduction

Hyperglycaemic crisis is one of the most serious acute metabolic complications of diabetes and includes three subtypes: diabetic ketoacidosis (DKA), hyperosmolar hyperglycaemic state (HHS), and mixed syndrome (combined DKA-HHS) [1]. Inpatients with DKA or recurrent DKA are all at high risk for all-cause mortality [2]. Among diabetic patients, 10% of deaths are caused by confirmed or possible DKA or coma [3]. HHS is common in elderly patients with diabetes mellitus. Despite a relatively low incidence, the mortality of hospitalized patients with HHS can reach up to 10–20% [4, 5]. Alarmingly, the 30-day mortality in patients with combined features of DKA and HHS is approximately 2.7 times higher than that in patients with isolated hyperglycaemic crisis [6]. In addition to a high risk of short-term mortality, patients with hyperglycaemic crisis episode have a higher long-term mortality after hospital discharge [1, 7]. At present, due to the poor understanding of the pathogenesis of hyperglycaemic crisis and the complexity of treatment, there is a lack of powerful indicators for evaluating the risk of mortality in patients with hyperglycaemic crisis [6]. Predicting the risk of mortality as well as providing personalized analysis of the risk factors in patients with hyperglycaemic crisis at initial diagnosis may help physicians to make correct clinical judgments and select the most appropriate strategy of treatment.

Much effort has been put into the development of prediction models to predict the risk of mortality for patients with hyperglycaemic crisis. Traditionally, linear models, such as logistic regression model and Cox proportional hazard model, have been used to develop such prediction models [8,9,10,11,12]. Nevertheless, modernhigh-dimensional and incomplete medical data present a challenge to traditional statistical models, and the low precision of linear models impedes patient-level use. Lacking adequate prediction tools, physicians mainly rely on subjective judgement, which is prone to errors and biases. Previous studies have applied machine learning to establish models for predicting the clinical outcomes of patients with diabetic complications and achieved promising results [Model explanation

The Shapley additive explanations (SHAP) algorithm was applied to the calibrated model to obtain explanations of the predictions of the model. The SHAP algorithm is one of the most popular model-agnostic algorithms for interpreting black-box model predictions [28, 29]. SHAP values were obtained by the SHAP algorithm, which provides interpretation of individual predictions. A SHAP value represents, given a set of feature values, how much a single feature value influences the difference between the actual prediction and the average prediction in its interaction with other feature values. Therefore, the mean prediction of the model plus the sum of the SHAP values for all features are consistent with the predicted result. Importantly, the SHAP value for a feature is not isolated but obtained by interacting with other features, which makes it different from the feature weight in the traditional generalized linear model.

To verify the rationality of the interpretation of the model predictions acquired by the SHAP algorithm, we first utilized the SHAP algorithm to obtain and visualize the overall effect of features on the predictions, that is, the contribution and relative importance ranking of each feature to the predictions. For further validation and comparison, we divided the patients in the training set into a survival group and a nonsurvival group according to their clinical outcomes and analysed the differences in features between the groups by statistical methods. In addition, based on proving the rationality of the explanations obtained by the SHAP algorithm, we further linearly mapped SHAP values to the probability of increased or decreased mortality and proposed a personalized mortality risk factor analysis method specific to patients with hyperglycaemic crisis, which visualized the contribution of each feature to the prediction in probability.

Results

For model development, 257 patients with hyperglycaemic crisis from two hospitals were enrolled. The baseline characteristics of these patients are depicted in Table 1. In the training set, the median age was 56 years (IQR 40.3–70.0), and 152 (59.1%) were male. Death occurred in 31 (12.1%) patients within the study period. To evaluate the external validity, the models were applied in the external test set, comprising 80 patients with hyperglycaemic crisis from two hospitals that were independent of the training set. In the test set, the receiver operating characteristic curve and AUC of the five models are shown in Fig. 1A (AUC = 1 indicates perfect prediction; AUC = 0 indicates random prediction). The other five evaluation metrics, including accuracy, sensitivity, specificity, NPV and PPV, for the five models are presented in Table 2. Overall, the findings demonstrated that the LightGBM model performed best among the five prediction models, with an AUC of 0.89 (95% CI 0.77, 0.97). The corresponding accuracy was 0.83 (0.74, 0.90), sensitivity was 0.74 (0.47, 0.94), specificity was 0.85 (0.76, 0.93), PPV was 0.52 (0.31, 0.74), and NPV was 0.94 (0.87, 0. 99). Therefore, the LightGBM model was selected as the best predictive model. The prediction probability of the LightGBM model was calibrated to make it close to the observed probability. The calibration plot indicated good agreement between the predicted and observed probabilities of the LightGBM model with a curve close to the 45° line, and the Brier score was 0.10 (0.05, 0.17) (Fig. 1B).

Table 1 Baseline characteristics of patients with hyperglycaemic crisis in the training set and test set
Fig. 1
figure 1

Discrimination and calibration performance of the models. A Receiver operating characteristic curves for the LR, SVM, RF, LightGBM, and DNN models. B Calibration curve for the LightGBM model

Table 2 The values of the evaluation metrics of the models in the test set

The contribution of each of the 41 features in the calibrated LightGBM model is shown in Fig. 2. The features were ranked by their relative importance to mortality prediction according to the SHAP values of the model predictions. It is not surprising that age was ranked as the most important feature for the prediction model, followed by blood glucose and blood urea nitrogen. In addition, taking the effect of age on the prediction as an example, older age was associated with a higher risk of death, and younger age drives the predictions towards survival. A similar explanation can be applied to other features, and most of the interpretation of features was consistent with clinical experience and previous evidence. Of note, features can drive the prediction in either direction (increase or decrease mortality prediction) in our explainable prediction model, which is different from the previous mortality risk scoring system based on a generalized linear model in which features can only drive mortality prediction in a single direction. As shown in Table 3, the results of statistical analysis revealed that the 9 most important features for the LightGBM model were significantly different between the survival group and the nonsurvival group in the training set (P < 0.05). Compared to the survival group, age, blood glucose, serum creatinine, blood urea nitrogen, cystatin C, effective serum osmolality, CK-MB, alanine aminotransferase, serum sodium, PH, HCO3 − and cardiac troponin I were significantly higher in the nonsurvival group (P < 0.05). However, hemoglobin A1c level was surprisingly significantly lower in the nonsurvival group than survival group (P < 0.05). An increasing trend of β-hydroxybutyrate was unexpectedly indicated in the survival group (P = 0.045). Therefore, the traditional statistical test results and the model interpretation results corroborated each other, which proved the rationality and accuracy of the interpretation of features acquired by the SHAP algorithm. Based on this evidence, we mapped SHAP values and proposed a personalized risk factor analysis tool for explaining the mortality prediction for a particular patient with hyperglycaemic crisis, which is a scale from 0 to 1, visualizing the contribution of each feature to the prediction in probability. We showed the application of the personalized risk factor analysis method in one deceased and one surviving patient with hyperglycaemic crisis during the follow-up period in the test set (Fig. 3). In the case of the deceased patient, the patient was an 88-year-old female with a history of septic shock and acute kidney injury. The model predicted that the risk of mortality of the patients was 0.623. Advanced age (88 years) drove a 0.58 increase in the risk of mortality, while relatively low hemoglobin A1c reduced the risk of mortality by 0.32. A similar explanation can be applied to other features. The prediction was driven by 41 features used for model training. The sum of the SHAP values for all features plus the baseline risk equals the predicted risk of mortality. The baseline risk (E[f(X)]) was obtained by calculating the average predicted risk of mortality among all patients in the training set. Thus, SHAP algorithm made our model explainable both in terms of the relative importance of individual features for survival of patient with hyperglycaemic crisis and those at patient level.

Fig. 2
figure 2

The impact of the input features on predictions. Each dot represents the effect of a feature on the prediction for one patient. The redder the colour of the dots, the higher the value of the features, and the bluer the colour of the dots, the lower the value of the features. Dots to the left x-axis represent patients with values of the features decreasing mortality prediction, and dots to the right x-axis represent patients with values of the features increasing mortality prediction

Table 3 Baseline characteristics of patients with hyperglycaemic crisis in the training set by clinical outcomes
Fig. 3
figure 3

Examples of personalized risk factors. A An example of personalized risk factor analysis for a patient in the test set (clinical outcome was death). B An example of personalized risk factor analysis for a patient in the test set (actual clinical outcome was survival)

Discussion

Experiencing a hyperglycaemic crisis is associated with a short- and long-term increased risk of mortality [1, 6, 7]. However, due to the complex pathogenesis of hyperglycaemic crisis, available international guidelines for the diagnosis and treatment of hyperglycaemic crisis are not consistent [4, 30]. In addition, there is a lack of strong indicators to assess the risk of mortality in patients with hyperglycaemic crisis. Therefore, the development of more effective methods to predict the risk of mortality, create individualized risk and benefit evaluations in patients with hyperglycaemic crises at initial diagnosis, which are particularly important to identify the best therapeutic strategies and improve the prognosis.

Here, we developed an explainable risk prediction model providing predictions and individualized risk factor assessment of the 3-year mortality of patients with hyperglycaemic crisis after admission. In the model building process, we selected five representative machine learning algorithms, including LR, SVM, RF, LightGBM, and DNN, to obtain the best prediction model. The LightGBM model performed the best of the five models evaluated in an external test set, with an AUC of 0.89. We further calibrated the LightGBM model to obtain a more reliable model. The SHAP algorithm was used to interpret the calibrated LightGBM model to obtain how each feature drives the prediction of the model. On the basis of verifying the effectiveness of the analytical method by comparing with the statistical test results, we further proposed a personalized mortality risk factor assessment method specific to patients with hyperglycaemic crisis. In the interpretation obtained by SHAP algorithm, the influence of each feature on the predictions is not isolated, but interacts with other features, which is related to the calculation method of SHAP value, and makes it different from the feature weight in the traditional generalized linear model.Thus, the developed explainable model can not only predict mortality but also provide a personalized risk factor assessment tool. Such an explainable model is a more useful tool than scoring systems based on generalized linear models that are currently implemented.

Most of the prediction tools constructed in past studies are based on generalized linear models, such as logistic regression models and Cox proportional hazard models [10, 11, 31]. However, the rapid development of information technology brings high-dimensional and nonlinear data, which challenges the traditional generalized linear model. Machine learning provides a powerful and novel method to extract information from complex medical data and develop more accurate predictions. That is, we can only obtain the input of the model and the output of the predictions. It is difficult to understand the details of how machine learning models analyse data and make decisions, which limited the application of the models at the individual level. A representative score called PHD was developed based on a generalized linear model by Huang et al. [10], which could be used to predict 30-day mortality risk and classify risk and disposition in patients with hyperglycaemic crisis. Since the variables we selected were different from the PHD score, the model we developed predicted the 3-year mortality of patients with hyperglycaemic crisis after admission. Therefore, we could not directly compare our model with the PHD score. However, an external validation study revealed that the AUC of the PHD score ranged from 0.357 to 0.727 [9]. In comparison, the AUC of the models developed in this study ranged from 0.63 to 0.89 in an external validation dataset. In addition, the developed LightGBM model also outperformed the conventional logistic regression model constructed in our study in the external test data (Fig. 1A, AUC of 0.89 vs. 0.63). We thus consider our model superior to traditional methods.

In addition, we used the SHAP algorithm to explain the black-box model to quantify and visualize the features that drive the predictions so that it not only had better prediction ability but also had transparency similar to that of the simple linear model. The tools established in this study combined the advantages of the complex machine learning model and simple linear model, solving the problems of insufficient prediction ability of the generalized linear model and black box nature of the machine learning model. The model we developed provided explanations of the risk factors that drive the model prediction, both in terms of the importance of individual features to the overall mortality prediction and contribution at the patient level. A comprehensible model allows clinicians to combine the predictions with their expertise to facilitate decision-making and assist clinicians in interventions [32, 33].

The effect of most features on the prediction is consistent with clinical experience and previous evidence. For example, advanced age, metabolic disorders, and impaired renal and cardiac function can predict for nonsurvival. Advanced age was the most important risk factor for mortality. There is substantial evidence that the physical function and resistance of patients decrease with age, which is more likely to increase the risk of mortality [12, 34, 35]. Severe metabolic disorders (elevated levels of blood glucose, effective serum osmolality, and serum sodium) may lead to confusion and even coma, which is associated with an increased risk of mortality [5, 36]. Likewise, there is consistent evidence that impaired renal (elevated levels of blood urea nitrogen, cystatin C, and serum creatinine,) and cardiac function (elevated levels of cardiac troponin I) increase the risk of mortality [1, 34, 37,38,39]. In addition, reduced levels of HbA1c drove the prediction towards nonsurvival. The effect of HbA1c on the prediction did not seem to live up to expectations. One reason for this counterintuitive issue might be that patients in the survival group had significantly better renal function than those in the nonsurvival group, and there is evidence that patients with chronic renal failure generally had a lower red blood cell (RBC) survival rate [40]. In addition, after treatment with erythropoietin, the newly generated RBCs lead to a further decrease in HbA1c [41]. An increasing trend of β-hydroxybutyrate was unexpectedly found in the survival group. It seems that it is a protective factor for patients with hyperglycaemic crisis. Previous studies evidence supports that blood β-hydroxybutyrate can reduce renal ischemia and reperfusion injury by increasing the upstream regulator forkhead transcription factor O3 and reducing caspase-1 and pro-inflammatory cytokines, thereby reducing cell death [42, 43].

Age was ranked as the most important feature for the model, followed by features related to metabolic disorders, cardiac and renal dysfunction. In a recent study, acute hyperglycaemic crisis episode impact on survival in individuals with diabetic foot ulcer using a machine learning approach, which also revealed that individual characteristics evaluated by Charlson Comorbidity Index (CCI) and acute organ injury played a vital role in disease prognosis [44]. The nine most important features for the prediction were significantly different between the survival group and nonsurvival group in the training set. Therefore, the effect of features on the predictions is consistent with the traditional statistical test results. Importantly, the developed explainable model can provide the relative importance of individual features for survival of patient and those at patient level, which makes it superior to traditional statistical tests that can only test for significant differences between groups. Admittedly, there are some limitations in our study. First, although multicentre data were used, due to the low morbidity of hyperglycaemic crisis, the amount of data was relatively small, which may lead to bias in the model. Second, in order to enable the models to obtain more comprehensive information and improve the performance of the tree-based models, our models contained up to 41 features. However, due to the limitation of data acquisition, the number of variables selected for the study is limited.. Third, the SHAP algorithm cannot address model bias, and the influence of features on the predictions is not equal to the association in the causal chain. Finally, Although the model is explainable, some features, such as age, cannot be manipulated by physicians. However, these insights into the relationship between features and predictions may guide our search for causality.

Conclusions

In summary, we developed an explainable machine learning model for predicting 3-year mortality and providing individualized risk factor assessment of inpatients with hyperglycaemic crises as well as hospital discharge, and the model was externally validated in an independent dataset. The interpretation results of the model revealed that more attention should be given to the variables related to metabolism and renal and cardiac function in the treatment of hyperglycaemic crisis, which played an important role in mortality through the model prediction. Transparent and explainable model predictions would help gain the trust of clinicians and facilitate decision-making by allowing physicians to evaluate whether the decision-making process of the model is consistent with scientific evidence and clinical experience. However, before this kind of tool used in the clinic, prospective studies are needed to be verified in the future.