Log in

TPBFS: two populations based feature selection method for medical data

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The high-dimensional nature of medical data frequently results in suboptimal performance of machine learning models. Applying feature selection before classification is necessary to improve the performance of classifiers. Although evolutionary-based wrapper feature selection methods are acknowledged for their superior performance in exploring optimal feature subsets, they have been demonstrated to carry the risk of overfitting and a potential loss of efficient search capability in the later stages of evolution. To address these issues, we propose a generalized wrapper feature selection method called Two Populations Based Feature Selection (TPBFS), which incorporates dual populations evolving in reverse directions to improve convergence speed. It introduces a probability-based crossover operation to mitigate overfitting and a record list to systematically track and replace optimal individuals, which helps to avoid getting stuck in local optima during later stages of evaluation. The experimental results demonstrate that TPBFS is effective in reducing the dimensionality of various medical datasets while guaranteeing the performance of classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The dataset utilized in this study is sourced from the UCI Machine Learning Repository. The data is readily accessible and can be obtained from https://archive.ics.uci.edu/.

References

  1. Shehab, M., Abualigah, L., Shambour, Q., Abu-Hashem, M.A., Shambour, M.K.Y., Alsalibi, A.I., Gandomi, A.H.: Machine learning in medical applications: a review of state-of-the-art methods. Comput. Biol. Med. 145, 105458 (2022)

    Article  Google Scholar 

  2. Belgacem A, Khoudi A, Boudane F, Berrichi A Machine Learning in the Medical Field: A Comprehensive Overview. In: 2023 International Conference on Decision Aid Sciences and Applications (DASA), 2023. IEEE, pp 103–108

  3. Smiti, A.: When machine learning meets medical world: current status and future challenges. Comp. Sci. Rev. 37, 100280 (2020)

    Article  MathSciNet  Google Scholar 

  4. Shah, D., Patel, S., Bharti, S.K.: Heart disease prediction using machine learning techniques. SN Comput. Sci. 1(6), 345 (2020)

    Article  Google Scholar 

  5. Parthiban, G., Srivatsa, S.: Applying machine learning methods in diagnosing heart disease for diabetic patients. Int. J. Appl. Inform. Syst. 3(7), 25–30 (2012)

    Google Scholar 

  6. Ramesh T, Lilhore UK, Poongodi M, Simaiya S, Kaur A, Hamdi M (2022) Predictive analysis of heart diseases with machine learning approaches. Malaysian J. Comput. Sci. 132–148

  7. Ahsan, M.M., Siddique, Z.: Machine learning-based heart disease diagnosis: a systematic literature review. Artif. Intell. Med. 128, 102289 (2022)

    Article  Google Scholar 

  8. Sachdeva, R.K., Bathla, P., Rani, P., Solanki, V., Ahuja, R.: A systematic method for diagnosis of hepatitis disease using machine learning. Innov. Syst. Softw. Eng. 19(1), 71–80 (2023)

    Article  Google Scholar 

  9. Obaido, G., Ogbuokiri, B., Swart, T.G., Ayawei, N., Kasongo, S.M., Aruleba, K., Mienye, I.D., Aruleba, I., Chukwu, W., Osaye, F.: An interpretable machine learning approach for hepatitis b diagnosis. Appl. Sci. 12(21), 11127 (2022)

    Article  Google Scholar 

  10. Syafaâ, L., Zulfatman, Z., Pakaya, I., Lestandy, M.: Comparison of machine learning classification methods in hepatitis C virus. J. Online Informatika 6(1), 73–78 (2021)

    Article  Google Scholar 

  11. Wang, W., Lee, J., Harrou, F., Sun, Y.: Early detection of Parkinson’s disease using deep learning and machine learning. IEEE Access 8, 147635–147646 (2020)

    Article  Google Scholar 

  12. Ayaz, Z., Naz, S., Khan, N.H., Razzak, I., Imran, M.: Automated methods for diagnosis of Parkinson’s disease and predicting severity level. Neural Comput. Appl. 35(20), 14499–14534 (2023)

    Article  Google Scholar 

  13. Makarious, M.B., Leonard, H.L., Vitale, D., Iwaki, H., Sargent, L., Dadu, A., Violich, I., Hutchins, E., Saffo, D., Bandres-Ciga, S.: Multi-modality machine learning predicting Parkinson’s disease. NPJ Parkinson’s Dis 8(1), 35 (2022)

    Article  Google Scholar 

  14. Rana, A., Dumka, A., Singh, R., Panda, M.K., Priyadarshi, N., Twala, B.: Imperative role of machine learning algorithm for detection of Parkinson’s disease: review, challenges and recommendations. Diagnostics 12, 2003 (2022)

    Article  Google Scholar 

  15. Cresswell, K., Majeed, A., Bates, D.W., Sheikh, A.: Computerised decision support systems for healthcare professionals: an interpretative review. Inform Primary Care 20(2), 115–128 (2012)

    Google Scholar 

  16. Pölsterl, S., Conjeti, S., Navab, N., Katouzian, A.: Survival analysis for high-dimensional, heterogeneous medical data: exploring feature extraction as an alternative to feature selection. Artif. Intell. Med. 72, 1–11 (2016)

    Article  Google Scholar 

  17. Rong, M., Gong, D., Gao, X.: Feature selection and its use in big data: challenges, methods, and trends. IEEE Access 7, 19709–19725 (2019)

    Article  Google Scholar 

  18. Sahebi, G., Movahedi, P., Ebrahimi, M., Pahikkala, T., Plosila, J., Tenhunen, H.: GeFeS: a generalized wrapper feature selection approach for optimizing classification performance. Comput. Biol. Med. 125, 103974 (2020)

    Article  Google Scholar 

  19. Hancer, E., Xue, B., Zhang, M.: Differential evolution for filter feature selection based on information theory and feature ranking. Knowl.-Based Syst. 140, 103–119 (2018)

    Article  Google Scholar 

  20. Kaur, S., Kumar, Y., Koul, A., Kumar Kamboj, S.: A systematic review on metaheuristic optimization techniques for feature selections in disease diagnosis: open issues and challenges. Arch. Comput. Methods Eng. 30(3), 1863–1895 (2023)

    Article  Google Scholar 

  21. Liu, H., Zhou, M., Liu, Q.: An embedded feature selection method for imbalanced data classification. IEEE/CAA J. Autom. Sin. 6(3), 703–715 (2019)

    Article  Google Scholar 

  22. Moslehi, F., Haeri, A.: A novel hybrid wrapper–filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. J. Ambient. Intell. Humaniz. Comput. 11, 1105–1127 (2020)

    Article  Google Scholar 

  23. Liu H, Setiono R (2022) Feature selection and classification–a probabilistic wrapper approach. In: Industrial and engineering applications or artificial intelligence and expert systems. CRC Press, pp 419–424

  24. Le, T.M., Vo, T.M., Pham, T.N., Dao, S.V.T.: A novel wrapper–based feature selection for early diabetes prediction enhanced with a metaheuristic. IEEE Access 9, 7869–7884 (2020)

    Article  Google Scholar 

  25. Alnowami, M.R., Abolaban, F.A., Taha, E.: A wrapper-based feature selection approach to investigate potential biomarkers for early detection of breast cancer. J. Rad. Res. Appl. Sci 15(1), 104–110 (2022)

    Google Scholar 

  26. Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recogn. 43(1), 5–13 (2010)

    Article  Google Scholar 

  27. Loughrey, J., Cunningham, P.: Overfitting in wrapper-based feature subset selection: the harder you try the worse it gets. In: International conference on innovative techniques and applications of artificial intelligence, pp. 33–43. Springer (2004)

    Google Scholar 

  28. Tian D A multi-objective genetic local search algorithm for optimal feature subset selection. In: 2016 International conference on computational science and computational intelligence (CSCI), 2016. IEEE, pp 1089–1094

  29. Pavai, G., Geetha, T.: New crossover operators using dominance and co-dominance principles for faster convergence of genetic algorithms. Soft. Comput. 23, 3661–3686 (2019)

    Article  Google Scholar 

  30. Asuncion A, Newman D (2007) UCI machine learning repository. Irvine, CA, USA,

  31. Masood, F., Masood, J., Zahir, H., Driss, K., Mehmood, N., Farooq, H.: Novel approach to evaluate classification algorithms and feature selection filter algorithms using medical data. J. Comput. Cogn. Eng. 2(1), 57–67 (2023)

    Google Scholar 

  32. Omuya, E.O., Okeyo, G.O., Kimwele, M.W.: Feature selection for classification using principal component analysis and information gain. Expert Syst. Appl. 174, 114765 (2021)

    Article  Google Scholar 

  33. Mostafa, R.R., Khedr, A.M., Al Aghbari, Z., Afyouni, I., Kamel, I., Ahmed, N.: An adaptive hybrid mutated differential evolution feature selection method for low and high-dimensional medical datasets. Knowl.-Based Syst. 283, 111218 (2024)

    Article  Google Scholar 

  34. Kamalov, F., Thabtah, F., Leung, H.H.: Feature selection in imbalanced data. Ann. Data Sci. 10(6), 1527–1541 (2023)

    Article  Google Scholar 

  35. Nadimi-Shahraki, M.H., Banaie-Dezfouli, M., Zamani, H., Taghian, S., Mirjalili, S.: B-MFO: a binary moth-flame optimization for feature selection from medical datasets. Computers 10(11), 136 (2021)

    Article  Google Scholar 

  36. Kavitha, C., Gadekallu, T.R., Nimala, K., Kavin, B.P., Lai, W.-C.: Filter-based ensemble feature selection and deep learning model for intrusion detection in cloud computing. Electronics 12(3), 556 (2023)

    Article  Google Scholar 

  37. Xue, Y., Zhu, H., Neri, F.: A feature selection approach based on NSGA-II with ReliefF. Appl. Soft Comput. 134, 109987 (2023)

    Article  Google Scholar 

  38. Urbanowicz, R.J., Olson, R.S., Schmitt, P., Meeker, M., Moore, J.H.: Benchmarking relief-based feature selection methods for bioinformatics data mining. J. Biomed. Inform. 85, 168–188 (2018)

    Article  Google Scholar 

  39. Sosa-Cabrera, G., García-Torres, M., Gómez-Guerrero, S., Schaerer, C.E., Divina, F.: A multivariate approach to the symmetrical uncertainty measure: application to feature selection problem. Inf. Sci. 494, 1–20 (2019)

    Article  MathSciNet  Google Scholar 

  40. Jiménez-Cordero, A., Morales, J.M., Pineda, S.: A novel embedded min-max approach for feature selection in nonlinear support vector machine classification. Eur. J. Oper. Res. 293(1), 24–35 (2021)

    Article  MathSciNet  Google Scholar 

  41. Cui, L., Bai, L., Wang, Y., Philip, S.Y., Hancock, E.R.: Fused lasso for feature selection using structural information. Pattern Recogn. 119, 108058 (2021)

    Article  Google Scholar 

  42. Liu, J., Zhang, S., Fan, H.: A two-stage hybrid credit risk prediction model based on XGBoost and graph-based deep neural network. Expert Syst. Appl. 195, 116624 (2022)

    Article  Google Scholar 

  43. Baldomero-Naranjo, M., Martinez-Merino, L.I., Rodriguez-Chia, A.M.: A robust SVM-based approach with feature selection and outliers detection for classification problems. Expert Syst. Appl. 178, 115017 (2021)

    Article  Google Scholar 

  44. Wang, H.: A novel feature selection method based on quantum support vector machine. Phys. Scr. 99(5), 056006 (2024)

    Article  Google Scholar 

  45. Hamla, H., Ghanem, K.: A hybrid feature selection based on fisher score and SVM-RFE for microarray data. Informatica (2024). https://doi.org/10.31449/inf.v48i1.4759

    Article  Google Scholar 

  46. Zhou, J., Hua, Z.: A correlation guided genetic algorithm and its application to feature selection. Appl. Soft Comput. 123, 108964 (2022)

    Article  Google Scholar 

  47. Spencer, R., Thabtah, F., Abdelhamid, N., Thompson, M.: Exploring feature selection and classification methods for predicting heart disease. Digit. Health 6, 2055207620914777 (2020)

    Article  Google Scholar 

  48. Tran, B., Zhang, M., Xue, B. A.: PSO based hybrid feature selection algorithm for high-dimensional classification. In: 2016 IEEE congress on evolutionary computation (CEC), 2016. IEEE, pp 3801–3808

  49. Nadimi-Shahraki, M.H., Zamani, H., Mirjalili, S.: Enhanced whale optimization algorithm for medical feature selection: a COVID-19 case study. Comput. Biol. Med. 148, 105858 (2022)

    Article  Google Scholar 

  50. Hegazy, A.E., Makhlouf, M., El-Tawel, G.S.: Improved salp swarm algorithm for feature selection. J. King Saud University-Comput. Inform. Sci. 32(3), 335–344 (2020)

    Google Scholar 

  51. Peng, L., Cai, Z., Heidari, A.A., Zhang, L., Chen, H.: Hierarchical Harris hawks optimizer for feature selection. J. Adv.Res. 53, 261–278 (2023)

    Article  Google Scholar 

  52. Islam MM, Iqbal H, Haque MR, Hasan MK Prediction of breast cancer using support vector machine and K-Nearest neighbors. In: 2017 IEEE region 10 humanitarian technology conference (R10-HTC), 2017. IEEE, pp 226–229

  53. Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R.: Efficient kNN classification with different numbers of nearest neighbors. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1774–1785 (2017)

    Article  MathSciNet  Google Scholar 

  54. Morgan, J.: Classification and regression tree analysis, p. 298. Boston University, Boston (2014)

    Google Scholar 

  55. Natekin, A., Knoll, A.: Gradient boosting machines, a tutorial. Front. Neurorobot. 7, 21 (2013)

    Article  Google Scholar 

  56. Kamel H, Abdulah D, Al-Tuwaijari JM Cancer classification using gaussian naive bayes algorithm. In: 2019 international engineering conference (IEC), 2019. IEEE, pp 165–170

  57. Nusinovici, S., Tham, Y.C., Yan, M.Y.C., Ting, D.S.W., Li, J., Sabanayagam, C., Wong, T.Y., Cheng, C.-Y.: Logistic regression was as good as machine learning for predicting major chronic diseases. J. Clin. Epidemiol. 122, 56–69 (2020)

    Article  Google Scholar 

  58. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  59. Mirjalili S, Zhang H, Mirjalili S, Chalup S, Noman N A novel U-shaped transfer function for binary particle swarm optimisation. In: Soft Computing for Problem Solving 2019: Proceedings of SocProS 2019, Volume 1, 2020. Springer, pp 241–259

  60. Gokulnath, C.B., Shantharajah, S.: An optimized feature selection based on genetic approach and support vector machine for heart disease. Clust. Comput. 22, 14777–14787 (2019)

    Article  Google Scholar 

  61. Amin, M.S., Chiam, Y.K., Varathan, K.D.: Identification of significant features and data mining techniques in predicting heart disease. Telematics Inform. 36, 82–93 (2019)

    Article  Google Scholar 

  62. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A.: Feature extraction: foundations and applications, vol. 207. Springer (2008)

    Google Scholar 

  63. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34, 483–519 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NSFC) under grant 82374559, the Sichuan Science and Technology Program under grants 2023YFSY0027 and 2023YFS0325, the Natural Science Foundation of Sichuan under grants 2022NSFSC0958 and 2024NSFSC0717, the Fundamental Research Funds for the Central Universities under grants ZYGX2021YGLH012 and ZYGX2021J020, the Ningbo Major Research and Development Plan Project under grant 20241ZDYF020354, and the Committee of Cadre Health of Sichuan Province under grant 2023-220.

Author information

Authors and Affiliations

Authors

Contributions

All authors read and approved the final manuscript. Haodi Quan: Conceptualization, Methodology, Software, Writing- original draft. Yun Zhang: Conceptualization, Methodology, Writing—review & editing. Qiaoqin Li: Validation, Supervision. Yongguo Liu: Validation, Supervision.

Corresponding author

Correspondence to Yongguo Liu.

Ethics declarations

Conflict of Interest

The authors have no conflicts of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Quan, H., Zhang, Y., Li, Q. et al. TPBFS: two populations based feature selection method for medical data. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04557-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-024-04557-6

Keywords

Navigation