Missing Data Imputation Using Ensemble Learning Technique: A Review

Jegadeeswari, K.; Ragunath, R.; Rathipriya, R.

doi:10.1007/978-981-19-3590-9_18

K. Jegadeeswari¹⁷,
R. Ragunath¹⁷ &
R. Rathipriya¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1428))

596 Accesses

Abstract

For the past two decades, several studies have been conducted on missing value imputation in bioinformatics and offered the best method or approach for handling the datasets with missing values. When the datasets have a lesser amount of missing attribute values in the entire database, the missing attribute values be able to remove from the dataset without taking a noteworthy influence on the final mine. However, if a huge number of attribute values are missing, suspicious attention should be given to handle these kinds missing data because the entire dataset will lose their valuable information and the quality of the datasets. In particular, datasets have more than one missing attribute value disturb the algorithms performance. Missing value imputation method’s aim is to provide high-quality dataset without loss of any valuable information intelligently where the missing values are smaller or larger. Meanwhile ensemble learning techniques are achieving high performance in data mining task for the past few years. Researchers, therefore, prefer to work on the imputation of missing data using ensemble learning, a technique that cannot be ignored nowadays because missing data in bioinformatics datasets are rapidly increasing. Ensemble learning aim is transforms from weak learner to strong learner. Those ensemble techniques can process a massive amount of data in an efficient manner. This paper concentrates on the review of missing value imputation techniques and ensemble learning models for analyzing biological data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multiple Imputation Ensembles (MIE) for Dealing with Missing Data

Article Open access 23 April 2020

Missing Data Characteristics and the Choice of Imputation Technique: An Empirical Study

Imputing missing value through ensemble concept based on statistical measures

Article 24 October 2017

References

Li, J., Wong, L., & Yang, Q. (2005). Guest editors’ introduction: Data mining in bioinformatics. IEEE Intelligent Systems, 20(6), 16–18.
Article Google Scholar
Ayilara, O. F., Zhang, L., Sajobi, T. T., Sawatzky, R., Bohm, E., & Lix, L. M. (2019). Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health and Quality of Life Outcomes, 17(1), 1-9.
Google Scholar
Dantan, E., Proust-Lima, C., Letenneur, L., & Jacqmin-Gadda, H. (2008). Pattern mixture models and latent class models for the analysis of multivariate longitudinal data with informative dropouts. The International Journal of Biostatistics, 4(1). 4. D. L. Langkamp, A. Lehman, and S. Lemeshow, “Techniques for handling missing data in secondary analyses of large surveys,” Academic Pediatrics, vol. 10, no. 3, pp. 205–210, 2010.
Google Scholar
Langkamp, D. L., Lehman, A., & Lemeshow, S. (2010). Techniques for handling missing data in secondary analyses of large surveys. Academic Pediatrics, 10(3), 205–210.
Article Google Scholar
Gómez-Carracedo, M. P., Andrade, J. M., López-Mahía, P., Muniategui, S., & Prada, D. (2014). A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets. Chemometrics and Intelligent Laboratory Systems, 134, 23–33.
Article Google Scholar
Al-Helali, B., Chen, Q., Xue, B., & Zhang, M. (2021). A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data. Soft Computing, 25(8), 5993–6012.
Article Google Scholar
Gómez-Carracedo, M. P., Andrade, J. M., López-Mahlía, P., Muniategui, S. & Prada, D. (2012). Multilevel multiple imputation. Multiple Imputation and its Application (pp. 203–228).
Google Scholar
Al-Helali, B., Chen, Q., Xue, B., Zhang, M. (2010). Supplemental material for best practices for missing data management in counseling psychology. Journal of Counseling Psychology.
Google Scholar
Schlomer, G. L., Bauman, S., & Card, N. A. (2010). Best practices for missing data management in counseling psychology. Journal of Counseling psychology, 57(1), 1
Google Scholar
Van Hulse, J., Khoshgoftaar, T. M., & Seiffert, C. (2006, December). A comparison of software fault imputation procedures. In 2006 5th International Conference on Machine Learning and Applications (ICMLA'06) (pp. 135-142).
Google Scholar
Prasomphan, S. (2012, December). Imputing Landsat7 ETM+ with SLC-off image using the similarity measurement between two clusters. In The First International Conference on Future Generation Communication Technologies (pp. 190-195).
Google Scholar
Clark, P. G., Grzymala-Busse, J. W., & Rzasa, W. (2014). Mining incomplete data with Singleton, subset and concept probabilistic approximations. Information Sciences, 280, 368–384.
Article MathSciNet Google Scholar
Burgette, L. F., & Reiter, J. P. (2010). Multiple imputation for missing data via sequential regression trees. American Journal of Epidemiology, 172(9), 1070–1076.
Article Google Scholar
Thomas, G. D. Machine Learning, ensemble methods. In SpringerReference
Google Scholar
Sharkey, J., Sharkey, N. E., & Cross, S. S. (1998). Adapting an ensemble approach for the diagnosis of breast cancer. ICANN, 98, 281–286.
Article Google Scholar
Chen, X., Wei, Z., Li, Z., Liang, J., Cai, Y., & Zhang, B. (2017). Ensemble correlation-based low-rank matrix completion with applications to traffic data imputation. Knowledge-Based Systems, 132, 249–262.
Article Google Scholar
Nti, I. K., Adekoya, A. F., & Weyori, B. A. (2020). A comprehensive evaluation of ensemble learning for stock-market prediction. Journal of Big Data, 7(1), 1-40.
Google Scholar
Ghosh, M., & Sanyal, G. (2018). An ensemble approach to stabilize the features for multi-domain sentiment analysis using supervised machine learning. Journal of Big Data, 5(1), 1-25.
Google Scholar
Bian, P., Li, W., **, Y., & Zhi, R. (2018). Ensemble feature learning for material recognition with Convolutional Neural Networks. EURASIP Journal on Image and Video Processing, 1, 2018.
Google Scholar
Khan, S. I., & Hoque, A. S. M. L. (2020). SICE: an improved missing data imputation technique. Journal of big Data, 7(1), 1-21.
Google Scholar
Liu, T., Guan, Y., & Lin, Y. (2017). Research on modulation recognition with Ensemble Learning. EURASIP Journal on Wireless Communications and Networking, 1, 2017.
Google Scholar
Liang, W., & Li, T. (2020). Research on human performance evaluation model based on neural network and data mining algorithm.
Google Scholar
Leevy, J. L., Hancock, J., Khoshgoftaar, T. M., & Peterson, J. M. (2022). IoT information theft prediction using ensemble feature selection. Journal of Big Data, 9(1), 1–48.
Google Scholar
Mahmoodi, S. A., Mirzaie, K., & Mahmoudi, S. M. (2016). A new algorithm to extract hidden rules of gastric cancer data based on ontology. SpringerPlus, 5(1), 1–21.
Google Scholar
Lin, C. Y., Kao, Y. H., Lee, W. B., & Chen, R. C. (2016). An efficient reversible privacy-preserving data mining technology over data streams. SpringerPlus, 5(1), 1–11.
Google Scholar
Hosseinzadeh, F., KayvanJoo, A. H., Ebrahimi, M., & Goliaei, B. (2013). Prediction of lung tumor types based on protein attributes by machine learning algorithms. SpringerPlus, 2(1), 1–14.
Google Scholar
Chen, L., Sun, Y., & Zhu, Y. (2015). Extraction methods for uncertain inference rules by ant colony optimization. Journal of Uncertainty Analysis and Applications, 3(1), 1–19.
Google Scholar
Tadist, K., Mrabti, F., Nikolov, N. S., Zahi, A., & Najah, S. (2021). SDPSO: Spark Distributed PSO-based approach for feature selection and cancer disease prognosis. Journal of Big Data, 8(1), 1-22.
Google Scholar
Farswan, A., Gupta, A., Gupta, R., & Kaur, G. (2020). Imputation of gene expression data in blood cancer and its significance in inferring biological pathways. Frontiers in oncology, 1442.
Google Scholar
Houari, R., Bounceur, A., Tari, A. K., & Kecha, M. T. (2014, June). Handling missing data problems with sampling methods. In 2014 International conference on advanced networking distributed systems and applications (pp. 99–104). IEEE.
Google Scholar
Dubey, A., & Rasool, A. (2021). Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour. Scientific Reports, 11(1), 1–12.
Google Scholar
Aghdam, R., Baghfalaki, T., Khosravi, P., & Ansari, E. S. (2017). The ability of different imputation methods to preserve the significant genes and pathways in cancer. Genomics, Proteomics & Bioinformatics, 15(6), 396–404.
Google Scholar
Kang, M. S., & Hong, K. S. (2018, October). Automatic bird-species recognition using the deep learning and Web data mining. In 2018 International Conference on Information and Communication Technology Convergence (ICTC) (pp. 1258–1260). IEEE.
Google Scholar
Li, J., Wang, P., Lin, L., Shi, W., Li, X., Wang, J., & Zhang, P. (2021, August). Intelligent diagnosis and recognition method of GIS partial discharge data map based on deep learning. In 2021 Power System and Green Energy Conference (PSGEC) (pp. 253–256). IEEE.
Google Scholar
Saranya, S., & Sasikala, S. (2020, November). Diagnosis Using Data Mining Algorithms for Malignant Breast Cancer Cell Detection. In 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA) (pp. 1062–1067). IEEE.
Google Scholar
Valdiviezo, H. C., & Van Aelst, S. (2015). Tree-based prediction on incomplete data using imputation or surrogate decisions. Information Sciences, 311, 163–181.
Google Scholar
Xu, X., **a, L., Zhang, Q., Wu, S., Wu, M., & Liu, H. (2020). The ability of different imputation methods for missing values in mental measurement questionnaires. BMC Medical Research Methodology, 20(1), 1–9.
Google Scholar
Purwar, A., & Singh, S. K. (2015). Hybrid prediction model with missing value imputation for medical data. Expert Systems with Applications, 42(13), 5621–5631.
Google Scholar
Aittokallio, T. (2009). Dealing with missing values in large-scale studies: Microarray data imputation and beyond. Briefings in Bioinformatics, 11(2), 253–264.
Article Google Scholar
Celton, M., Malpertuy, A., Lelandais, G., & De Brevern, A. G. (2010). Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments. BMC genomics, 11(1), 1–16.
Google Scholar
Ding, Y., & Ross, A. (2012). A comparison of imputation methods for handling missing scores in biometric fusion. Pattern Recognition, 45(3), 919–933.
Article Google Scholar
Oh, S., Kang, D. D., Brock, G. N., & Tseng, G. C. (2010). Biological impact of missing-value imputation on downstream analyses of gene expression profiles. Bioinformatics, 27(1), 78–86.
Article Google Scholar
Pati, S. K., & Das, A. K. (2017). Missing value estimation for microarray data through cluster analysis. Knowledge and Information Systems, 52(3), 709–750.
Article Google Scholar
Liao, S. G., Lin, Y., Kang, D. D., Chandra, D., Bon, J., Kaminski, N., Sciurba, F. C., & Tseng, G. C. (2014). Missing value imputation in high-dimensional phenomic data: imputable or not, and how? BMC Bioinformatics, 15(1), 1–12
Google Scholar
Jerez, J. M., Molina, I., García-Laencina, P. J., Alba, E., Ribelles, N., Martín, M., & Franco, L. (2010). Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artificial Intelligence in Medicine, 50(2), 105–115.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Periyar University, Salem, Tamil Nadu, India
K. Jegadeeswari, R. Ragunath & R. Rathipriya

Authors

K. Jegadeeswari
View author publications
You can also search for this author in PubMed Google Scholar
R. Ragunath
View author publications
You can also search for this author in PubMed Google Scholar
R. Rathipriya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Rathipriya .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Gnanamani College of Technology, Namakkal, Tamil Nadu, India
G. Ranganathan
Ryerson Communications Lab, Toronto, ON, Canada
Xavier Fernando
Department of Information Systems, University of Florida, Gainesville, FL, USA
Selwyn Piramuthu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jegadeeswari, K., Ragunath, R., Rathipriya, R. (2023). Missing Data Imputation Using Ensemble Learning Technique: A Review. In: Ranganathan, G., Fernando, X., Piramuthu, S. (eds) Soft Computing for Security Applications. Advances in Intelligent Systems and Computing, vol 1428. Springer, Singapore. https://doi.org/10.1007/978-981-19-3590-9_18

Download citation

DOI: https://doi.org/10.1007/978-981-19-3590-9_18
Published: 30 September 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-3589-3
Online ISBN: 978-981-19-3590-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Missing Data Imputation Using Ensemble Learning Technique: A Review

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple Imputation Ensembles (MIE) for Dealing with Missing Data

Missing Data Characteristics and the Choice of Imputation Technique: An Empirical Study

Imputing missing value through ensemble concept based on statistical measures

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Missing Data Imputation Using Ensemble Learning Technique: A Review

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple Imputation Ensembles (MIE) for Dealing with Missing Data

Missing Data Characteristics and the Choice of Imputation Technique: An Empirical Study

Imputing missing value through ensemble concept based on statistical measures

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation