Abstract
Objective
To evaluate RSF and Cox models for mortality prediction of hemorrhagic stroke (HS) patients in intensive care unit (ICU).
Methods
In the training set, the optimal models were selected using five-fold cross-validation and grid search method. In the test set, the bootstrap method was used to validate. The area under the curve(AUC) was used for discrimination, Brier Score (BS) was used for calibration, positive predictive value(PPV), negative predictive value(NPV), and F1 score were combined to compare.
Results
A total of 2,990 HS patients were included. For predicting the 7-day mortality, the mean AUCs for RSF and Cox regression were 0.875 and 0.761, while the mean BS were 0.083 and 0.108. For predicting the 28-day mortality, the mean AUCs for RSF and Cox regression were 0.794 and 0.649, while the mean BS were 0.129 and 0.174. The mean AUCs of RSF and Cox versus conventional scores for predicting patients’ 7-day mortality were 0.875 (RSF), 0.761 (COX), 0.736 (SAPS II), 0.723 (OASIS), 0.632 (SIRS), and 0.596 (SOFA), respectively.
Conclusions
RSF provided a better clinical reference than Cox. Creatine, temperature, anion gap and sodium were important variables in both models.
Similar content being viewed by others
Introduction
Stroke is the leading cause of death and long-term disability worldwide [1]. 2019 global burden of disease study (GBD) data [2]shows that stroke remains to be the second leading cause of death (11.6% of deaths) and the third leading cause of disability (5.7% of total disability-adjusted life years) in the world. Hemorrhagic stroke (HS) accounts for 37.6% of all stroke types and causes 5.5 million deaths per year approximately, with about half of deaths caused by stroke due to HS. The risk of death from HS is higher compared to ischemic stroke (IS) [3], with a 30-day mortality of 13-61% [4]. In recent years, more and more stroke patients are admitted to the intensive care unit (ICU) for neurological monitoring or management of complications, and 10-30% of them are in critical condition [5]. Hence, it is of great significance to optimize the allocation of medical resources by identifying and managing high-risk groups.
Predicting the occurrence of adverse outcomes is the prerequisite for risk stratification. Risk scores are helpful tools for prediction. Many investigators have developed diverse disease risk scoring systems. Traditional scoring systems commonly used in clinical practice include acute physiology and chronic health evaluation(APACHE II) [6], sequential organ failure assessment(SOFA) [7], Oxford acute severity of illness score(OASIS) [8], and simplified acute physiology score(SAPSII) [9], which include various variables with their respective point assignment scheme [10]. However, these traditional scores are applicable to a wide population, whose effectiveness in predicting specific diseases’ prognosis is not always satisfactory [11, 12], the application of these scores in HS is limited. Many scholars have made efforts to construct predictive tools for HS. Ho and Smith et al. [13, 14] built a prediction model of HS death in the ICU by logistic regression, and stratified the risk degree of patients by calculating risk scores. However, with the increasing number of clinical examinations and diagnostic items, clinical data often present multidimensional, highly correlated, and nonlinear characteristics [15], which limits the application conditions of traditional clinical modeling methods such as logistic and Cox regression [16]. To compensate for the shortcomings of traditional analytical methods, machine learning algorithms have emerged in the era of big data [17]. Lin and Trevisi et al. [18, 19] employed common machine learning algorithms, such as support vector machine, random forest, and neural network to predict poor functional outcomes in HS patients in the hospital. Howerer, those studies only considered the probability of survival without incorporating the time dimension, by which model prediction is often imprecise [19, 20]. Random survival forest (RSF) is a derivative of the random forest algorithm in survival analysis, which can not only handle complex right-censored survival data but also analyze interactions between variables, and has been applied to pancreatic cancer [21], sepsis [22], and breast cancer [45], and Luo et al. showed a steep linear relationship between reduced blood creatine levels and increased risk of in-hospital and 1-year mortality in patients with intracranial hemorrhage when blood creatine values were < 1.9 mg/dL [46]. We found a 1.3-fold increase in the risk of in-hospital death in HS patients for each range of temperature change, which was consistent with previous studies [47]. Iglesias Rey et al. conducted a retrospective study of 887 patients with non-traumatic cerebral hemorrhage and found that patients with hypertensive cerebral hemorrhage had the highest body temperature and the greatest increase in body temperature within 24 h. Patients with hypertensive cerebral hemorrhage who developed hyperthermia after 3 months had a 5.3-fold increased risk of poor prognosis, moreover, the amount of edema within 24 h was positively correlated with body temperature in patients with cerebral hemorrhage due to hypertension [48]. Anion gap reflects the acid-base balance in body fluids and plays an important role in the identification of metabolic acidosis [49]. Previous studies have shown that anion gap is an important short- and long-term prognostic marker in patients with IS [50], however, its use in patients with HS is less studied. Shen et al. found that HS patients experienced a decrease in the mini-mental state examination, GCS and other indicators of neurological and cognitive function as the anion gap increased at the time of admission [51]. A meta-analysis had shown that high sodium intake was positively associated with stroke risk, with a 23% increase in stroke risk for every 86 mmol/d increase in sodium intake [52]. Wang et al., who included 64,909 patients with non-traumatic HS in the United States, showed that spontaneous cerebral hemorrhage patients with abnormal serum sodium had a 1.11-fold increased risk of 30-day readmission compared to patients with normal serum sodium [30].
We have some strengths in this study. We not only compared the predictive efficacy of RSF and Cox, we also compared the models constructed by RSF and Cox with the clinical traditional scoring systems, in addition, we also found the variables that had a strong influence on the occurrence of patient’s deaths in the ICU and ranked the variables in terms of importance, which may provide guidance for further practical applications. However, this study has several limitations. Firstly, the MIMIC-IV database is a single-center database, which may limit the applicability of the study results to patients in other centers, so future inclusion of clinical data from multiple centers is desired for external validation. Secondly, due to the limitations of the MIMIC-IV database features, some important indicators such as bilirubin, lactate and albumin could not be included in the analysis because of serious missing values. Finally, only demographic information, laboratory indicators, and comorbidity information were included in this study, and some important information such as medication and imaging tests were not included, which reduced the predictive performance of the models.
Conclusion
We constructed the RSF and Cox models based on the survival data of patients with HS in the ICU. The results showed that the prediction performance of RSF was better than Cox regression for 7-day and 28-day mortality, with creatine, temperature, anion gap and sodium ranking in the top 10 important variables in both models. RSF can provide new ideas for clinical decision-making of HS patients.
Data Availability
MIMIC-IV is a publicly available database. All data in this study can be found at https://physionet.org/about/database/.
Abbreviations
- AUC:
-
Area under the curve
- BUN:
-
Blood urea nitrogen
- CCI:
-
Charlson comorbidity index
- CPD:
-
Chronic pulmonary disease
- DBP:
-
Diastolic blood pressure
- GCS:
-
Glasgow coma scale
- HR:
-
Hazard ratio
- HS:
-
Hemorrhagic stroke
- INR:
-
International normalized ratio
- LASSO:
-
Least absolute shrinkage and selection operator
- LOS:
-
Length of stay
- MBP:
-
Mean blood pressure
- NPV:
-
Negative predictive value
- OASIS:
-
Oxford acute severity of illness score
- PPV:
-
Positive predictive value
- PT:
-
Prothrombin time
- PTT:
-
Partial thromboplastin time
- PVD:
-
Peripheral vascular disease
- ROC:
-
Receiver operation characteristic
- RR:
-
Respiratory rate
- RSF:
-
Random survival forest
- SAPSII:
-
Simplified acute physiology score
- SBP:
-
Systolic blood pressure
- SIRS:
-
Systemic infammactery response syndrome score
- SOFA:
-
Sequential organ failure assessment
- Spo2:
-
Peripheral capillary oxygen saturation
- WBC:
-
White blood cell
References
Thayabaranathan T, Kim J, Cadilhac DA, Thrift AG, Donnan GA, Howard G, Howard VJ, Rothwell PM, Feigin V, Norrving B et al. Global stroke statistics 2022. Int J Stroke 2022.
Feigin VL, Stark BA, Johnson CO, Roth GA, Bisignano C, Abady GG, Abbasifard M, Abbasi-Kangevari M, Abd-Allah F, Abedi V, et al. Global, regional, and national burden of stroke and its risk factors, 1990–2019: a systematic analysis for the global burden of Disease Study 2019. Lancet Neurol. 2021;20(10):795–820.
Akyea RK, Georgiopoulos G, Iyen B, Kai J, Qureshi N, Ntaios G. Comparison of risk of Serious Cardiovascular events after hemorrhagic versus ischemic stroke: a Population-Based study. Thromb Haemost. 2022;122(11):1921–31.
van Asch CJ, Luitse MJ, Rinkel GJ, van der Tweel I, Algra A, Klijn CJ. Incidence, case fatality, and functional outcome of intracerebral haemorrhage over time, according to age, sex, and ethnic origin: a systematic review and meta-analysis. Lancet Neurol. 2010;9(2):167–76.
Carval T, Garret C, Guillon B, Lascarrou JB, Martin M, Lemarie J, Dupeyrat J, Seguin A, Zambon O, Reignier J, et al. Outcomes of patients admitted to the ICU for acute stroke: a retrospective cohort. Bmc Anesthesiol. 2022;22(1):235.
Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med. 1985;13(10):818–29.
Ferreira FL, Bota DP, Bross A, Melot C, Vincent JL. Serial evaluation of the SOFA score to predict outcome in critically ill patients. JAMA. 2001;286(14):1754–8.
Johnson AE, Kramer AA, Clifford GD. A new severity of illness scale using a subset of Acute Physiology and Chronic Health evaluation data elements shows comparable predictive accuracy. Crit Care Med. 2013;41(7):1711–8.
Le Gall JR, Lemeshow S, Saulnier F. A new simplified Acute Physiology score (SAPS II) based on a European/North American multicenter study. JAMA. 1993;270(24):2957–63.
Rahmatinejad Z, Hoseini B, Rahmatinejad F, Abu-Hanna A, Bergquist R, Pourmand A, Miri M, Eslami S. Internal validation of the Predictive performance of Models based on three ED and ICU Scoring Systems to predict Inhospital Mortality for Intensive Care Patients referred from the Emergency Department. Biomed Res Int. 2022;2022:3964063.
Afrash MR, Mirbagheri E, Mashoufi M, Kazemi-Arpanahi H. Optimizing prognostic factors of five-year survival in gastric cancer patients using feature selection techniques with machine learning algorithms: a comparative study. BMC Med Inform Decis Mak. 2023;23(1):54.
Deng Y, Liu S, Wang Z, Wang Y, Jiang Y, Liu B. Explainable time-series deep learning models for the prediction of mortality, prolonged length of stay and 30-day readmission in intensive care patients. Front Med. 2022;9:933037.
Ho WM, Lin JR, Wang HH, Liou CW, Chang KC, Lee JD, Peng TY, Yang JT, Chang YJ, Chang CH, et al. Prediction of in-hospital stroke mortality in critical care unit. SpringerPlus. 2016;5(1):1051.
Smith EE, Shobha N, Dai D, Olson DM, Reeves MJ, Saver JL, Hernandez AF, Peterson ED, Fonarow GC, Schwamm LH. A risk score for in-hospital death in patients admitted with ischemic or hemorrhagic stroke. J Am Heart Association. 2013;2(1):e005207.
Van Calster B, Wynants L. Machine learning in Medicine. N Engl J Med. 2019;380(26):2588.
Steele AJ, Denaxas SC, Shah AD, Hemingway H, Luscombe NM. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS ONE. 2018;13(8):e0202344.
Sabetian G, Azimi A, Kazemi A, Hoseini B, Asmarian N, Khaloo V, Zand F, Masjedi M, Shahriarirad R, Shahriarirad S. Prediction of patients with COVID-19 requiring Intensive Care: a cross-sectional study based on machine-learning Approach from Iran. Indian J Crit care Medicine: peer-reviewed Official Publication Indian Soc Crit Care Med. 2022;26(6):688–95.
Lin CH, Hsu KC, Johnson KR, Fann YC, Tsai CH, Sun Y, Lien LM, Chang WL, Chen PL, Lin CL, et al. Evaluation of machine learning methods to stroke outcome prediction using a nationwide disease registry. Comput Methods Programs Biomed. 2020;190:105381.
Trevisi G, Caccavella VM, Scerrati A, Signorelli F, Salamone GG, Orsini K, Fasciani C, D’Arrigo S, Auricchio AM, D’Onofrio G, et al. Machine learning model prediction of 6-month functional outcome in elderly patients with intracerebral hemorrhage. Neurosurg Rev. 2022;45(4):2857–67.
Jiang L, Zhou L, Yong W, Cui J, Geng W, Chen H, Zou J, Chen Y, Yin X, Chen YC. A deep learning-based model for prediction of hemorrhagic transformation after stroke. Brain Pathol. 2023;33(2):e13023.
Lee KS, Jang JY, Yu YD, Heo JS, Han HS, Yoon YS, Kang CM, Hwang HK, Kang S. Usefulness of artificial intelligence for predicting recurrence following surgery for pancreatic cancer: retrospective cohort study. Int J Surg. 2021;93:106050.
Zhang L, Huang T, Xu F, Li S, Zheng S, Lyu J, Yin H. Prediction of prognosis in elderly patients with sepsis based on machine learning (random survival forest). BMC Emerg Med. 2022;22(1):26.
**ao J, Mo M, Wang Z, Zhou C, Shen J, Yuan J, He Y, Zheng Y. The application and comparison of machine learning models for the prediction of breast Cancer prognosis: Retrospective Cohort Study. JMIR Med Inform. 2022;10(2):e33440.
Bao L, Wang YT, Zhuang JL, Liu AJ, Dong YJ, Chu B, Chen XH, Lu MQ, Shi L, Gao S, et al. Machine learning-based overall survival prediction of Elderly patients with multiple myeloma from Multicentre Real-Life Data. Front Oncol. 2022;12:922039.
Grendas LN, Chiapella L, Rodante DE, Daray FM. Comparison of traditional model-based statistical methods with machine learning for the prediction of suicide behaviour. J Psychiatr Res. 2021;145:85–91.
Pei W, Wang C, Liao H, Chen X, Wei Y, Huang X, Liang X, Bao H, Su D, ** G. MRI-based random survival forest model improves prediction of progression-free survival to induction chemotherapy plus concurrent Chemoradiotherapy in Locoregionally Advanced nasopharyngeal carcinoma. BMC Cancer. 2022;22(1):739.
MIMIC-IV. (version 1.0) [https://mimic.mit.edu/docs/iv/]
Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594.
Carlsson M, Wilsgaard T, Johnsen SH, Johnsen LH, Lochen ML, Njolstad I, Mathiesen EB. Long-term survival, causes of death, and Trends in 5-Year Mortality after Intracerebral Hemorrhage: the Tromso Study. Stroke. 2021;52(12):3883–90.
Wang Y, Wang J, Chen S, Li B, Lu X, Li J. Different changing patterns for Stroke Subtype Mortality Attributable to High Sodium Intake in China during 1990 to 2019. Stroke. 2023;54(4):1078–87.
Allison PD. Multiple imputation for missing data - A cautionary tale. Sociol Method Res. 2000;28(3):301–9.
Yosefian I, Farkhani EM, Baneshi MR. Application of Random Forest Survival Models to increase generalizability of decision trees: a Case Study in Acute myocardial infarction. Comput Math Methods Med. 2015;2015:576413.
Mogensen UB, Ishwaran H, Gerds TA. Evaluating Random forests for Survival Analysis using Prediction Error Curves. J Stat Softw. 2012;50(11):1–23.
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random Survival Forests. Wiley StatsRef: Statistics Reference Online 2019.
Wang X, Gong G, Li N, Qiu S. Detection analysis of epileptic EEG using a Novel Random Forest Model Combined with Grid Search optimization. Front Hum Neurosci. 2019;13:52.
Ishwaran H. Variable importance in binary regression trees and forests. Electron J Stat. 2007;1:519–37.
Kay R. Goodness of fit methods for the proportional hazards regression model: a review. Rev Epidemiol Sante Publique. 1984;32(3–4):185–98.
Grunkemeier GL, Wu Y. Bootstrap resampling methods: something for nothing? Ann Thorac Surg. 2004;77(4):1142–4.
Murphy AH. A New Decomposition of the Brier score - formulation and interpretation. Mon Weather Rev. 1986;114(12):2671–3.
Tang H, ** Z, Deng J, She Y, Zhong Y, Sun W, Ren Y, Cao N, Chen C. Development and validation of a deep learning model to predict the survival of patients in ICU. J Am Med Inform Assoc. 2022;29(9):1567–76.
Hadanny A, Shouval R, Wu J, Gale CP, Unger R, Zahger D, Gottlieb S, Matetzky S, Goldenberg I, Beigel R, et al. Machine learning-based prediction of 1-year mortality for acute coronary syndrome(). J Cardiol. 2022;79(3):342–51.
Qiu X, Gao J, Yang J, Hu J, Hu W, Kong L, Lu JJ. A comparison study of machine learning (Random Survival Forest) and Classic Statistic (Cox Proportional Hazards) for Predicting Progression in High-Grade Glioma after Proton and Carbon Ion Radiotherapy. Front Oncol. 2020;10:551420.
Lin CH, Kuo YW, Huang YC, Lee M, Huang YW, Kuo CF, Lee JD. Development and validation of a Novel score for Predicting Long-Term Mortality after an Acute Ischemic Stroke. Int J Environ Res Public Health 2023, 20(4).
Huang T, Huang L, Yang R, Li S, He N, Feng A, Li L, Lyu J. Machine learning models for predicting survival in patients with ampullary adenocarcinoma. Asia Pac J Oncol Nurs. 2022;9(12):100141.
Pickering JW, Frampton CM, Walker RJ, Shaw GM, Endre ZH. Four hour creatinine clearance is better than plasma creatinine for monitoring renal function in critically ill patients. Crit Care. 2012;16(3):R107.
Luo H, Yang X, Chen K, Lan S, Liao G, Xu J. Blood creatinine and urea nitrogen at ICU admission and the risk of in-hospital death and 1-year mortality in patients with intracranial hemorrhage. Front Cardiovasc Med. 2022;9:967614.
Liddle LJ, Dirks CA, Almekhlafi M, Colbourne F. An ambiguous role for fever in worsening Outcome after Intracerebral Hemorrhage. Transl Stroke Res. 2023;14(2):123–36.
Iglesias-Rey R, Rodriguez-Yanez M, Arias S, Santamaria M, Rodriguez-Castro E, Lopez-Dequidt I, Hervella P, Sobrino T, Campos F, Castillo J. Inflammation, edema and poor outcome are associated with hyperthermia in hypertensive intracerebral hemorrhages. Eur J Neurol. 2018;25(9):1161–8.
Kraut JA, Nagami GT. The serum anion gap in the evaluation of acid-base disorders: what are its limitations and can its effectiveness be improved? Clin J Am Soc Nephrol. 2013;8(11):2018–24.
Liu X, Feng Y, Zhu X, Shi Y, Lin M, Song X, Tu J, Yuan E. Serum anion gap at admission predicts all-cause mortality in critically ill patients with cerebral infarction: evidence from the MIMIC-III database. Biomarkers: Biochem Indic Exposure Response Susceptibility Chemicals. 2020;25(8):725–32.
Shen J, Li DL, Yang ZS, Zhang YZ, Li ZY. Anion gap predicts the long-term neurological and cognitive outcomes of spontaneous intracerebral hemorrhage. Eur Rev Med Pharmacol Sci. 2022;26(9):3230–6.
Strazzullo P, D’Elia L, Kandala NB, Cappuccio FP. Salt intake, stroke, and cardiovascular disease: meta-analysis of prospective studies. BMJ. 2009;339:b4567.
Acknowledgements
Not applicable.
Funding
This study received financial support from the National Key Research and Development Program of China (No.2018YFC1311700).
Author information
Authors and Affiliations
Contributions
YW conceived the study, performed the statistical analyses, and wrote the first manuscript draft. BL critically revised the manuscript. YD assisted with the study design and data analysis. YT and MZ assisted with manuscript editing. YJ contributed to data interpretation and manuscript revision. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
The Institutional Review Board of the Beth Israel Deaconess Medical Center and Massachusetts Institute of Technology approved the research use of MIMIC IV for researchers having attended their training course. The author had completed all the data research training from the Collaborative Institutional Training Initiative in order to obtain database permission (Record ID:52310626).
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wang, Y., Deng, Y., Tan, Y. et al. A comparison of random survival forest and Cox regression for prediction of mortality in patients with hemorrhagic stroke. BMC Med Inform Decis Mak 23, 215 (2023). https://doi.org/10.1186/s12911-023-02293-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12911-023-02293-2