Abstract
In many parts of the world, heart disease is the leading cause of death. Preventing or effectively managing cardiac disease often depends on its early detection. There has been a significant uptick in research towards using machine learning to estimate the probability of cardiovascular disease. Using a variety of classification methods and stacking as ensemble techniques, this work investigates the problem of predicting cardiovascular illness. A total of 1025 patients are used in the analysis, and their clinical data is broken down into 14 different categories (e.g., age, sex, chest pain kind, blood pressure, cholesterol levels, and more). The initial step of the analysis is to preprocess the data by filling in missing values, standardizing the numbers, and encoding the categories. After that, the information is segmented into a training set and a test set for the purposes of model building and testing. Logistic Regression, Decision Tree, Random Forest, Extreme Gradient Boost, Naive Bayes, and K-Nearest Neighbors (KNN) are the six classification methods used in the research. Accuracy, precision, recall, and F1-score are only some of the measures used to assess the efficacy of various classification methods. The findings reveal that random forest and decision tree both yields a 92.68% accuracy, with extreme gradient boost coming in as a close second at 90.73%. In the second portion of the research, ensemble approaches, and more especially stacking, are used to boost the classification models’ accuracy. The goal of stacking, a method of ensemble machine learning, is to increase prediction precision by using numerous models in concert. By training a meta-classifier on the predictions of the base models, stacking combines the predictions of multiple base models. The results demonstrate that stacking considerably enhances the efficiency of the original classifiers. The stacked model outperforms each individual classifier, with an accuracy of 98.53%.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19293-7/MediaObjects/11042_2024_19293_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19293-7/MediaObjects/11042_2024_19293_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19293-7/MediaObjects/11042_2024_19293_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19293-7/MediaObjects/11042_2024_19293_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19293-7/MediaObjects/11042_2024_19293_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19293-7/MediaObjects/11042_2024_19293_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19293-7/MediaObjects/11042_2024_19293_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19293-7/MediaObjects/11042_2024_19293_Fig8_HTML.png)
Similar content being viewed by others
Data availability
Data and materials can be provided on request.
Abbreviations
- IoT:
-
Internet of Things
- XGBoost:
-
Extreme Gradient Boosting
- ML:
-
Machine Learning
- KNN:
-
K-Nearest Neighbors
- RF:
-
Random Forest
- LM:
-
Linear Model
- PCA:
-
Principal Component Analysis
- CHI:
-
Chi-square
- CH:
-
Cleveland-Hungarian
- SVM:
-
Support Vector Machine
- RFBM:
-
Random Forest Bagging Method
- FCMIM:
-
Fast Conditional Mutual Information
- LOSO:
-
Leave-One-Subject-Out
- HRFLM:
-
Hybrid Random Forest with Linear Model
- LASSO:
-
Least Absolute Shrinkage and Selection Operator
- NB:
-
Naive Bayes
- BN:
-
Bayesian Network
- MP:
-
Multilayer Perceptron
- GLM:
-
Generalized Linear Model
- LR:
-
Logistic Regression
- DL:
-
Deep Learning
- DT:
-
Decision Tree
- GBT:
-
Gradient Boosted Trees
- ANN:
-
Artificial neural network
- NB:
-
Naive Bayes
- TP:
-
True Positive
- TN:
-
True Negative
- FP:
-
False Positive
- FN:
-
False Negative
- RBF:
-
Radial Basis Function
References
Bharti R, Khamparia A, Shabaz M, Dhiman G, Pande S, Singh P (2021) Prediction of heart disease using a combination of machine learning and deep learning. Computational intelligence and neuroscience 2021
Mohan S, Thirumalai C, Gautam Srivastava (2019) Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7:81542–81554
Bhagat M, Kumar D, Balgi SM (2021) Application of internet of things in Digital Pedagogy. In: Deyasi A, Mukherjee S, Mukherjee A, Bhattacharjee AK, Mondal A (eds) Computational intelligence in digital pedagogy. Intelligent systems reference library, vol 197. Springer, Singapore. https://doi.org/10.1007/978-981-15-8744-3_11
Gárate-Escamila AK, El Hassani AH, Andrès E (2020) Classification models for heart disease prediction using feature selection and PCA. Inf Med Unlocked 19:100330
Uddin S, Khan A, Hossain ME, Moni MA (2019) Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inf Decis Mak 19(1):1–16
Bhagat M, Kumar D (2023) Efficient feature selection using BoWs and SURF method for leaf disease identification. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-14625-5
Ramesh TR, Lilhore UK, Poongodi M, Simaiya S, Kaur A, Mounir Hamdi M (2022) Predictive analysis of heart diseases with machine learning approaches. Malays J Comput Sci 132–148
Alotaibi FS (2019) Implementation of machine learning model to predict heart failure disease. Int J Adv Comput Sci Appl 10:6
Ghosh P, Azam S, Jonkman M, Karim A, Javed Mehedi Shamrat FM, Ignatious E, Shultana S, Beeravolu AR, De Boer F (2021) Efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques. IEEE Access 9:19304–19326
Li JP, Haq AU, Din SU, Khan J, Khan A, Saboor A (2020) Heart disease identification method using machine learning classification in e-healthcare. IEEE Access 8:107562–107582
Ali L, Niamat A, Khan JA, Golilarz NA, **ngzhong X, Noor A, Nour R, Bukhari SAC (2019) An optimized stacked support vector machines based expert system for the effective prediction of heart failure. IEEE Access 7:54007–54014
Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inf Med Unlocked 16:100203
Kumar M et al (2022) A comparative performance assessment of optimized multilevel ensemble learning model with existing classifier models. Big Data 10(5):371–387
Saihood Q, Sonuç E (2023) A practical framework for early detection of diabetes using ensemble machine learning models. Turk J Electr Eng Comput Sci 31(4):722–738
Reza MS, Amin R, Yasmin R, Kulsum W, Ruhi S (2024) Improving diabetes disease patients classification using stacking ensemble method with PIMA and local healthcare data. Heliyon 10(2)
Kang H (2013) The prevention and handling of the missing data. Korean J Anesthesiol 64(5):402
Bhagat M, Kumar D (2023) Performance evaluation of PCA based reduced features of leaf images extracted by DWT using random forest and XGBoost classifier. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-14370-9
Bhagat M, Kumar D (2022) A comprehensive survey on leaf disease identification & classification. Multimed Tools Appl 81:33897–33925. https://doi.org/10.1007/s11042-022-12984-z
Myles AJ, Feudale RN, Liu Y, Woody NA, Brown SD (2004) An introduction to decision tree modeling. J Chemometr 18(6):275–285
Bhagat M, Kumar D, Kumar S (2023) Bell pepper leaf disease classification with LBP and VGG-16 based fused features and RF classifier. Int J Inf Tecnol 15:465–475. https://doi.org/10.1007/s41870-022-01136-z
Qin F, Liu D, Sun B, Ruan L, Ma Z, Wang H (2016) Identification of alfalfa leaf diseases using image recognition technology. PLoS ONE 11:1–26. https://doi.org/10.1371/journal.pone.0168274
Kour VP, Arora S (2019) Particle swarm optimization-based support vector machine (P-SVM) for the segmentation and classification of plants. IEEE Access 7:29374–29385
Gupta A, Jain V, Singh A (2022) Stacking ensemble-based intelligent machine learning model for predicting post-COVID-19 complications. New Gener Comput 40:987–1007. https://doi.org/10.1007/s00354-021-00144-0
Sharma N, Dev J, Mangla M, Wadhwa VM, Mohanty SN, Kakkar D (2021) A heterogeneous ensemble forecasting model for disease prediction. New Gener Comput 1–15
Acknowledgements
The authors would like to express their gratitude to the reviewers who provided valuable and insightful feedbacks.
Funding
This study received no specific financing from governmental, private, or non-profit funding bodies.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There are no conflicts of interest declared by the authors.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bhagat, M., Sharma, A. & Agarwal, P. An efficient stacking-based ensemble technique for early heart attack prediction. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19293-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-19293-7