Regression Based Data Pre-processing Technique for Predicting Missing Values

  • Conference paper
  • First Online:
Emerging Research in Computing, Information, Communication and Applications

Abstract

Missing values in datasets are caused by various reasons. These missing values in datasets adversely impact the performance of data mining (DM) algorithms. These values can be ignored if there are large number of instances in the dataset. However, deleting the records containing missing values in smaller sized datasets can lead to improper classification or predictions by the data mining algorithms. Several methods for finding missing values are present. Commonly used methods like replacing the missing values by the mean of the column, repeated value, are present. However, these techniques don’t approximately forecast the missing values. In our work, we have considered regression-based methodology for predicting the missing values. Results on stock dataset demonstrated that the polynomial regression (PR) model exhibited better performance when compared to linear regression (LR), quadratic regression (QR), pure-quadratic regression (PQR) and interactions regression (IR) models for prediction of missing values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Brazil)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (Brazil)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (Brazil)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (Brazil)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Parthasarathy S, Aggarwal CC (2003) On the use of conceptual reconstruction for mining massively incomplete data sets. IEEE Trans Knowled Data Eng 15(6):1512–1521. https://doi.org/10.1109/TKDE.2003.1245289

  2. Aggarwal CC (2001) On the effects of dimensionality reduction on high dimensional similarity search. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, New York, United States, pp 256–266. https://doi.org/10.1145/375551.383213

  3. Han J, Kamber M, Pei J (2011) In: Data mining: concepts and techniques. 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

    Google Scholar 

  4. Sridevi S, Rajaram S, Parthiban C, SibiArasan S, Swadhikar C (2011) Imputation for the analysis of missing values and prediction of time series data. In: International conference on recent trends in information technology (ICRTIT), IEEE, Chennai, India, pp 1158–1163. https://doi.org/10.1109/ICRTIT.2011.5972466

  5. Doreswamy IG, Manjunatha BR (2017) Performance evaluation of predictive models for missing data imputation in weather data. In: International conference on advances in computing, communications and informatics (ICACCI), IEEE, Udupi, India, pp 1327–1334. https://doi.org/10.1109/ICACCI.2017.8126025

  6. Kotteti CMM, Dong X, Li N, Qian L (2018) Fake news detection enhancement with data imputation. In 16th international conference on dependable, autonomic and secure computing, IEEE, Athens, pp 187–192. https://doi.org/10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00042

  7. Zeng D, **e D, Liu R, Li X (2017) Missing value imputation methods for TCM medical data and its effect in the classifier accuracy. In: 19th international conference on e-health networking, applications and services (Healthcom), IEEE, Dalian, pp 1–4. https://doi.org/10.1109/HealthCom.2017.8210844

  8. Zou KH, Tuncali K, Silverman SG (2003) Correlation and simple linear regression. Technical report, Statistical Concepts Series

    Google Scholar 

  9. Aditya Shastry K, Sanjay HA, Bhanusree E (2017) Prediction of crop yield using regression techniques. Int J Soft Comput 12:96−102. https://doi.org/10.36478/ijscomp.2017.96.102

  10. Wisniewski M (2017) Applied regression analysis: a research tool. J Operational Res Soc 41(8):782–783. https://doi.org/10.1057/jors.1990.106

  11. KEEL-dataset repository (2018). https://sci2s.ugr.es/keel/dataset.php?cod=77. Last Accessed 19 June 2018

  12. Karama A, Farouk M, Atiya A (2018) A multi linear regression approach for handling missing values with unknown dependent variable (MLRMUD). In: 14th international computer engineering conference (ICENCO), IEEE, Cairo, Egypt, pp 195–201. https://doi.org/10.1109/ICENCO.2018.8636126

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Aditya Shastry .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aditya Shastry, K., Sanjay, H.A., Praveen, M.S. (2022). Regression Based Data Pre-processing Technique for Predicting Missing Values. In: Shetty, N.R., Patnaik, L.M., Nagaraj, H.C., Hamsavath, P.N., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Lecture Notes in Electrical Engineering, vol 789. Springer, Singapore. https://doi.org/10.1007/978-981-16-1338-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-1338-8_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-1337-1

  • Online ISBN: 978-981-16-1338-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation