Introduction

Ischemic stroke is a major cause of death and disability globally

Stroke is one of the leading causes of morbidity and mortality worldwide, and the risk factors for stroke are complicated, such as cardiovascular diseases, diabetes, hyperlipidaemia, and unhealthy lifestyles [1, 2]. Ischemic stroke (IS) accounts for approximately 87% of all stroke cases: ischemic stroke, haemorrhagic stroke, and transient ischemic attack [3]. In China, stroke became the top leading cause of years of life lost, with rising mortality rates from 106 per 100,000 persons in 1990 to 149 per 100,000 persons in 2017 [4]. China national report showed that the age-standardised prevalence of stroke reached 1114.8 per 100,000 persons in 2013, imposing an enormous burden on the healthcare system [5]. In middle-income countries, only 10% to 20% of stroke patients could reach the hospital within 3 h (treatment during this period may still lead to disability). From the perspective of predictive, preventive and personalised medicine (PPPM/3PM), a prompt and accurate diagnosis of the stroke allows for reducing treatment delay and improving stroke outcomes [5].

Challenges in triaging patients with ischemic stroke

For now, the diagnosis of stroke in the less developed area mainly relies on neurological examination. However, this physical examination performed by a less experienced examiner can result in diagnoses with lower accuracy and reliability [6]. Moreover, the reported prediction models for IS diagnosis mostly relied on the conventional statistical models. For example, Cox proportional hazard model uses selected features for the prediction of disease occurrence, which is hard to predict discrete events and has a relatively low efficiency [7, 8]. Therefore, to improve subjective decision making in resource-limited settings, a paradigm change from reactive medicine to PPPM/3PM is needed [9]. We, therefore, developed and validated the predictive tool of IS using individualised IS patient profiles, and we also reported its feasibility.

Machine learning is an optimistic strategy for ischemic stroke diagnosis in the context of PPPM/3PM

In the context of PPPM/3PM, the real-time predictive analytic tool of IS can be instructive in identifying those at high risk who may benefit from the prompt intervention e.g. thrombolysis with alteplase and endovascular treatment [10,11,12]. Artificial intelligence (AI) approaches can incorporate high dimensional and multivariate data to solve these challenging issues [13], and machine learning (ML) is a subdomain of AI involving the automatic discovery of patterns within data [14]. Among various ML-based models, supervised learning tools, e.g. random forest, neural network, and extreme gradient boosting (XGBoost), can learn complicated structures by incorporating numerous variables with multiple dimensional data [13, 15]. Furthermore, owning to the outstanding predictive performance [16], ML approaches have been applied to solve real problems in the framework of PPPM/3PM, including predictors selection, predictive diagnostics, targeted prevention, and personalised medical services [9, 17, 18].

Although complex ML models provide high prediction accuracy, they are less human-understandable. For critical applications in the field of medicine, explanations of ML-based prediction models are essential for users to understand and trust the models established [19]. Extra techniques to peer into the black-box ML models are thus needed. Permutation feature importance (PFI) is a global explanation method that provides insights into the model’s behaviour in general [20]. Apart from global explanation, local interpretable model-agnostic explanations (LIME) and SHapley Additive exPlanations (SHAP) are two well-accepted local explanation approaches to interpret why a certain prediction was made for a specific individual by incorporating the individualised patient profile [Full size image