Abstract
Missing values in datasets are caused by various reasons. These missing values in datasets adversely impact the performance of data mining (DM) algorithms. These values can be ignored if there are large number of instances in the dataset. However, deleting the records containing missing values in smaller sized datasets can lead to improper classification or predictions by the data mining algorithms. Several methods for finding missing values are present. Commonly used methods like replacing the missing values by the mean of the column, repeated value, are present. However, these techniques don’t approximately forecast the missing values. In our work, we have considered regression-based methodology for predicting the missing values. Results on stock dataset demonstrated that the polynomial regression (PR) model exhibited better performance when compared to linear regression (LR), quadratic regression (QR), pure-quadratic regression (PQR) and interactions regression (IR) models for prediction of missing values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Parthasarathy S, Aggarwal CC (2003) On the use of conceptual reconstruction for mining massively incomplete data sets. IEEE Trans Knowled Data Eng 15(6):1512–1521. https://doi.org/10.1109/TKDE.2003.1245289
Aggarwal CC (2001) On the effects of dimensionality reduction on high dimensional similarity search. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, New York, United States, pp 256–266. https://doi.org/10.1145/375551.383213
Han J, Kamber M, Pei J (2011) In: Data mining: concepts and techniques. 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA
Sridevi S, Rajaram S, Parthiban C, SibiArasan S, Swadhikar C (2011) Imputation for the analysis of missing values and prediction of time series data. In: International conference on recent trends in information technology (ICRTIT), IEEE, Chennai, India, pp 1158–1163. https://doi.org/10.1109/ICRTIT.2011.5972466
Doreswamy IG, Manjunatha BR (2017) Performance evaluation of predictive models for missing data imputation in weather data. In: International conference on advances in computing, communications and informatics (ICACCI), IEEE, Udupi, India, pp 1327–1334. https://doi.org/10.1109/ICACCI.2017.8126025
Kotteti CMM, Dong X, Li N, Qian L (2018) Fake news detection enhancement with data imputation. In 16th international conference on dependable, autonomic and secure computing, IEEE, Athens, pp 187–192. https://doi.org/10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00042
Zeng D, **e D, Liu R, Li X (2017) Missing value imputation methods for TCM medical data and its effect in the classifier accuracy. In: 19th international conference on e-health networking, applications and services (Healthcom), IEEE, Dalian, pp 1–4. https://doi.org/10.1109/HealthCom.2017.8210844
Zou KH, Tuncali K, Silverman SG (2003) Correlation and simple linear regression. Technical report, Statistical Concepts Series
Aditya Shastry K, Sanjay HA, Bhanusree E (2017) Prediction of crop yield using regression techniques. Int J Soft Comput 12:96−102. https://doi.org/10.36478/ijscomp.2017.96.102
Wisniewski M (2017) Applied regression analysis: a research tool. J Operational Res Soc 41(8):782–783. https://doi.org/10.1057/jors.1990.106
KEEL-dataset repository (2018). https://sci2s.ugr.es/keel/dataset.php?cod=77. Last Accessed 19 June 2018
Karama A, Farouk M, Atiya A (2018) A multi linear regression approach for handling missing values with unknown dependent variable (MLRMUD). In: 14th international computer engineering conference (ICENCO), IEEE, Cairo, Egypt, pp 195–201. https://doi.org/10.1109/ICENCO.2018.8636126
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Aditya Shastry, K., Sanjay, H.A., Praveen, M.S. (2022). Regression Based Data Pre-processing Technique for Predicting Missing Values. In: Shetty, N.R., Patnaik, L.M., Nagaraj, H.C., Hamsavath, P.N., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Lecture Notes in Electrical Engineering, vol 789. Springer, Singapore. https://doi.org/10.1007/978-981-16-1338-8_9
Download citation
DOI: https://doi.org/10.1007/978-981-16-1338-8_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1337-1
Online ISBN: 978-981-16-1338-8
eBook Packages: EngineeringEngineering (R0)