Missing Data Imputation Using Ensemble Learning Technique: A Review

  • Conference paper
  • First Online:
Soft Computing for Security Applications

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1428))

  • 596 Accesses

Abstract

For the past two decades, several studies have been conducted on missing value imputation in bioinformatics and offered the best method or approach for handling the datasets with missing values. When the datasets have a lesser amount of missing attribute values in the entire database, the missing attribute values be able to remove from the dataset without taking a noteworthy influence on the final mine. However, if a huge number of attribute values are missing, suspicious attention should be given to handle these kinds missing data because the entire dataset will lose their valuable information and the quality of the datasets. In particular, datasets have more than one missing attribute value disturb the algorithms performance. Missing value imputation method’s aim is to provide high-quality dataset without loss of any valuable information intelligently where the missing values are smaller or larger. Meanwhile ensemble learning techniques are achieving high performance in data mining task for the past few years. Researchers, therefore, prefer to work on the imputation of missing data using ensemble learning, a technique that cannot be ignored nowadays because missing data in bioinformatics datasets are rapidly increasing. Ensemble learning aim is transforms from weak learner to strong learner. Those ensemble techniques can process a massive amount of data in an efficient manner. This paper concentrates on the review of missing value imputation techniques and ensemble learning models for analyzing biological data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Li, J., Wong, L., & Yang, Q. (2005). Guest editors’ introduction: Data mining in bioinformatics. IEEE Intelligent Systems, 20(6), 16–18.

    Article  Google Scholar 

  2. Ayilara, O. F., Zhang, L., Sajobi, T. T., Sawatzky, R., Bohm, E., & Lix, L. M. (2019). Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health and Quality of Life Outcomes, 17(1), 1-9.

    Google Scholar 

  3. Dantan, E., Proust-Lima, C., Letenneur, L., & Jacqmin-Gadda, H. (2008). Pattern mixture models and latent class models for the analysis of multivariate longitudinal data with informative dropouts. The International Journal of Biostatistics, 4(1). 4. D. L. Langkamp, A. Lehman, and S. Lemeshow, “Techniques for handling missing data in secondary analyses of large surveys,” Academic Pediatrics, vol. 10, no. 3, pp. 205–210, 2010.

    Google Scholar 

  4. Langkamp, D. L., Lehman, A., & Lemeshow, S. (2010). Techniques for handling missing data in secondary analyses of large surveys. Academic Pediatrics, 10(3), 205–210.

    Article  Google Scholar 

  5. Gómez-Carracedo, M. P., Andrade, J. M., López-Mahía, P., Muniategui, S., & Prada, D. (2014). A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets. Chemometrics and Intelligent Laboratory Systems, 134, 23–33.

    Article  Google Scholar 

  6. Al-Helali, B., Chen, Q., Xue, B., & Zhang, M. (2021). A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data. Soft Computing, 25(8), 5993–6012.

    Article  Google Scholar 

  7. Gómez-Carracedo, M. P., Andrade, J. M., López-Mahlía, P., Muniategui, S. & Prada, D. (2012). Multilevel multiple imputation. Multiple Imputation and its Application (pp. 203–228).

    Google Scholar 

  8. Al-Helali, B., Chen, Q., Xue, B., Zhang, M. (2010). Supplemental material for best practices for missing data management in counseling psychology. Journal of Counseling Psychology.

    Google Scholar 

  9. Schlomer, G. L., Bauman, S., & Card, N. A. (2010). Best practices for missing data management in counseling psychology. Journal of Counseling psychology, 57(1), 1

    Google Scholar 

  10. Van Hulse, J., Khoshgoftaar, T. M., & Seiffert, C. (2006, December). A comparison of software fault imputation procedures. In 2006 5th International Conference on Machine Learning and Applications (ICMLA'06) (pp. 135-142).

    Google Scholar 

  11. Prasomphan, S. (2012, December). Imputing Landsat7 ETM+ with SLC-off image using the similarity measurement between two clusters. In The First International Conference on Future Generation Communication Technologies (pp. 190-195).

    Google Scholar 

  12. Clark, P. G., Grzymala-Busse, J. W., & Rzasa, W. (2014). Mining incomplete data with Singleton, subset and concept probabilistic approximations. Information Sciences, 280, 368–384.

    Article  MathSciNet  Google Scholar 

  13. Burgette, L. F., & Reiter, J. P. (2010). Multiple imputation for missing data via sequential regression trees. American Journal of Epidemiology, 172(9), 1070–1076.

    Article  Google Scholar 

  14. Thomas, G. D. Machine Learning, ensemble methods. In SpringerReference

    Google Scholar 

  15. Sharkey, J., Sharkey, N. E., & Cross, S. S. (1998). Adapting an ensemble approach for the diagnosis of breast cancer. ICANN, 98, 281–286.

    Article  Google Scholar 

  16. Chen, X., Wei, Z., Li, Z., Liang, J., Cai, Y., & Zhang, B. (2017). Ensemble correlation-based low-rank matrix completion with applications to traffic data imputation. Knowledge-Based Systems, 132, 249–262.

    Article  Google Scholar 

  17. Nti, I. K., Adekoya, A. F., & Weyori, B. A. (2020). A comprehensive evaluation of ensemble learning for stock-market prediction. Journal of Big Data, 7(1), 1-40.

    Google Scholar 

  18. Ghosh, M., & Sanyal, G. (2018). An ensemble approach to stabilize the features for multi-domain sentiment analysis using supervised machine learning. Journal of Big Data, 5(1), 1-25.

    Google Scholar 

  19. Bian, P., Li, W., **, Y., & Zhi, R. (2018). Ensemble feature learning for material recognition with Convolutional Neural Networks. EURASIP Journal on Image and Video Processing, 1, 2018.

    Google Scholar 

  20. Khan, S. I., & Hoque, A. S. M. L. (2020). SICE: an improved missing data imputation technique. Journal of big Data, 7(1), 1-21.

    Google Scholar 

  21. Liu, T., Guan, Y., & Lin, Y. (2017). Research on modulation recognition with Ensemble Learning. EURASIP Journal on Wireless Communications and Networking, 1, 2017.

    Google Scholar 

  22. Liang, W., & Li, T. (2020). Research on human performance evaluation model based on neural network and data mining algorithm.

    Google Scholar 

  23. Leevy, J. L., Hancock, J., Khoshgoftaar, T. M., & Peterson, J. M. (2022). IoT information theft prediction using ensemble feature selection. Journal of Big Data, 9(1), 1–48.

    Google Scholar 

  24. Mahmoodi, S. A., Mirzaie, K., & Mahmoudi, S. M. (2016). A new algorithm to extract hidden rules of gastric cancer data based on ontology. SpringerPlus, 5(1), 1–21.

    Google Scholar 

  25. Lin, C. Y., Kao, Y. H., Lee, W. B., & Chen, R. C. (2016). An efficient reversible privacy-preserving data mining technology over data streams. SpringerPlus, 5(1), 1–11.

    Google Scholar 

  26. Hosseinzadeh, F., KayvanJoo, A. H., Ebrahimi, M., & Goliaei, B. (2013). Prediction of lung tumor types based on protein attributes by machine learning algorithms. SpringerPlus, 2(1), 1–14.

    Google Scholar 

  27. Chen, L., Sun, Y., & Zhu, Y. (2015). Extraction methods for uncertain inference rules by ant colony optimization. Journal of Uncertainty Analysis and Applications, 3(1), 1–19.

    Google Scholar 

  28. Tadist, K., Mrabti, F., Nikolov, N. S., Zahi, A., & Najah, S. (2021). SDPSO: Spark Distributed PSO-based approach for feature selection and cancer disease prognosis. Journal of Big Data, 8(1), 1-22.

    Google Scholar 

  29. Farswan, A., Gupta, A., Gupta, R., & Kaur, G. (2020). Imputation of gene expression data in blood cancer and its significance in inferring biological pathways. Frontiers in oncology, 1442.

    Google Scholar 

  30. Houari, R., Bounceur, A., Tari, A. K., & Kecha, M. T. (2014, June). Handling missing data problems with sampling methods. In 2014 International conference on advanced networking distributed systems and applications (pp. 99–104). IEEE.

    Google Scholar 

  31. Dubey, A., & Rasool, A. (2021). Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour. Scientific Reports, 11(1), 1–12.

    Google Scholar 

  32. Aghdam, R., Baghfalaki, T., Khosravi, P., & Ansari, E. S. (2017). The ability of different imputation methods to preserve the significant genes and pathways in cancer. Genomics, Proteomics & Bioinformatics, 15(6), 396–404.

    Google Scholar 

  33. Kang, M. S., & Hong, K. S. (2018, October). Automatic bird-species recognition using the deep learning and Web data mining. In 2018 International Conference on Information and Communication Technology Convergence (ICTC) (pp. 1258–1260). IEEE.

    Google Scholar 

  34. Li, J., Wang, P., Lin, L., Shi, W., Li, X., Wang, J., & Zhang, P. (2021, August). Intelligent diagnosis and recognition method of GIS partial discharge data map based on deep learning. In 2021 Power System and Green Energy Conference (PSGEC) (pp. 253–256). IEEE.

    Google Scholar 

  35. Saranya, S., & Sasikala, S. (2020, November). Diagnosis Using Data Mining Algorithms for Malignant Breast Cancer Cell Detection. In 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA) (pp. 1062–1067). IEEE.

    Google Scholar 

  36. Valdiviezo, H. C., & Van Aelst, S. (2015). Tree-based prediction on incomplete data using imputation or surrogate decisions. Information Sciences, 311, 163–181.

    Google Scholar 

  37. Xu, X., **a, L., Zhang, Q., Wu, S., Wu, M., & Liu, H. (2020). The ability of different imputation methods for missing values in mental measurement questionnaires. BMC Medical Research Methodology, 20(1), 1–9.

    Google Scholar 

  38. Purwar, A., & Singh, S. K. (2015). Hybrid prediction model with missing value imputation for medical data. Expert Systems with Applications, 42(13), 5621–5631.

    Google Scholar 

  39. Aittokallio, T. (2009). Dealing with missing values in large-scale studies: Microarray data imputation and beyond. Briefings in Bioinformatics, 11(2), 253–264.

    Article  Google Scholar 

  40. Celton, M., Malpertuy, A., Lelandais, G., & De Brevern, A. G. (2010). Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments. BMC genomics, 11(1), 1–16.

    Google Scholar 

  41. Ding, Y., & Ross, A. (2012). A comparison of imputation methods for handling missing scores in biometric fusion. Pattern Recognition, 45(3), 919–933.

    Article  Google Scholar 

  42. Oh, S., Kang, D. D., Brock, G. N., & Tseng, G. C. (2010). Biological impact of missing-value imputation on downstream analyses of gene expression profiles. Bioinformatics, 27(1), 78–86.

    Article  Google Scholar 

  43. Pati, S. K., & Das, A. K. (2017). Missing value estimation for microarray data through cluster analysis. Knowledge and Information Systems, 52(3), 709–750.

    Article  Google Scholar 

  44. Liao, S. G., Lin, Y., Kang, D. D., Chandra, D., Bon, J., Kaminski, N., Sciurba, F. C., & Tseng, G. C. (2014). Missing value imputation in high-dimensional phenomic data: imputable or not, and how? BMC Bioinformatics, 15(1), 1–12

    Google Scholar 

  45. Jerez, J. M., Molina, I., García-Laencina, P. J., Alba, E., Ribelles, N., Martín, M., & Franco, L. (2010). Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artificial Intelligence in Medicine, 50(2), 105–115.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Rathipriya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jegadeeswari, K., Ragunath, R., Rathipriya, R. (2023). Missing Data Imputation Using Ensemble Learning Technique: A Review. In: Ranganathan, G., Fernando, X., Piramuthu, S. (eds) Soft Computing for Security Applications. Advances in Intelligent Systems and Computing, vol 1428. Springer, Singapore. https://doi.org/10.1007/978-981-19-3590-9_18

Download citation

Publish with us

Policies and ethics

Navigation