Abstract
Handling missing values in time series data plays a key role in prediction and forecasting, as complete and clean historical data helps in achieving higher accuracy. Numerous research works are present in multivariate time series imputation but imputation in univariate time series data are least considered due to the unavailability of other correlated variables (attributes). However, these algorithms do not perform well when most of the tuples are clustered due to a lack of neighbors during imputation. This paper aims to propose an iterative imputation algorithm by clustering univariate time series data, considering the trend, seasonality, cyclical and residue features of the data. The proposed method uses a similarity-based nearest neighbor imputation approach on each cluster for filling missing values. The proposed method is evaluated on publicly available data set from the Data Market repository and UCI repository by randomly simulating missing patterns throughout the data series. The outcome of the proposed method is evaluated with metrics like MSE, MAE and RMSE and also validated through prediction accuracy and Concordance Correlation Coefficient (CCC) statistical test. Experimental results indicate that the proposed imputation method produces closer values to the original time series data set, resulting in low error rates compared to other existing imputation methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ghil M, Vautard R (1991) Interdecadal oscillations and the warming trend in global temperature time series. Nature 350(6316):324
Billinton R, Chen H, Ghajar R (1996) Time-series models for reliability evaluation of power systems including wind energy. Microelectron Reliab 36(9):1253-1261
Rubin DB (2004) Multiple imputation for nonresponse in surveys, vol 81. Wiley
Ford B (1983) An overview of hot-deck procedures: incomplete data in sample surveys 2
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J Roy Stat Soc Ser B (Methodol) 39(1):1–22
Gautam C, Ravi V (2015) Data imputation via evolutionary computation, clustering and a neural network. Neuro Comput 156:134–142
Rahman MG (2013) Islam MZ missing value imputation using decision trees and decision forests by splitting and merging records: two novel techniques. Knowl Based Syst 53:51–65
Rahman MG, Islam MZ Fimus (2014) A framework for imputing missing values using co-appearance, correlation and similarity analysis. Knowl Based Syst 56:311–327
Tutz G, Ramzan S (2015) Improved methods for the imputation of missing data by nearest neighbor methods. Comput Stat Data Anal 90:84–99
Gheyas IA, Smith LS (2010) A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73(16–18):3039–3065
Engels JM, Diehr P (2003) Imputation of missing longitudinal data: a comparison of methods. J Clin Epidemiol 56(10):968–976
Spratt M, Carpenter J, Sterne JA, Carlin JB, Heron J, Henderson J, Tilling K (2010) Strategies for multiple imputation in longitudinal studies. Am J Epidemiol 172(4):478–487
Twisk J, de Vente W (2002) Attrition in longitudinal studies: how to deal with missing data. J Clin Epidemiol 55(4):329–337
Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38(18):2895–2907
Zeileis A, Grothendieck G (2005) Zoo: S3 infrastructure for regular and irregular time series. ar**v preprint math/0505527 https://doi.org/10.18637/jss.v014.i06
Hyndman RJ, Shang HL (2009) Forecasting functional time series. J Korean Stat Soc 38(3):199–211
Sinopoli B, Schenato L, Franceschetti M, Poolla K, Jordan MI, Sastry SS (2004) Kalman filtering with intermittent observations. IEEE Trans Autom Control 49(9):1453–1464
Oba S, Ma S, Takemasa I, Monden M, Ki M, Ishii S (2003) A Bayesian missing value estimation method for gene expression pro le data. Bioinformatics 19(16):2088–2096
Shen L, Ma Q, Li S (2018) End-to-end time series imputation via residual short paths. In: Asian conference on machine learning, pp 248–263
Li L, McCann J, Pollard NS, Faloutsos C (2009) Dynammo: mining and summarization of coevolving sequences with missing values. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 507–516
White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30(4):377–399
Anava O, Hazan E, Zeevi A (2015) Online time series prediction with missing data. In: International conference on machine learning, pp 2191–2199
Little RJ, Rubin DB (2019) Statistical analysis with missing data, vol 793. Wiley
Zhu B, He C, Liatsis P (2012) A robust missing value imputation method for noisy data. Appl Intell 36(1):61–74
Little RJ (1988) A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc 83(404):1198–1202
Moritz S, Sardá A, Bartz-Beielstein T, Zaefferer M, Stork J (2015) Comparison of different methods for univariate time series imputation in R. ar**v preprint ar**v:151003924
Luo J, Chen D (2008) An enhanced art2 neural network for clusteringanalysis. In: First international workshop on knowledge discovery and data mining (WKDD 2008). IEEE, pp 81–85
Carpenter GA, Grossberg S (2017) Adaptive resonance theory. Springer
GarcÃa S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer
Friedman J, Hastie T, Tibshirani R (2001) The elements of statisticallearning, vol 1. Springer Series in Statistics, New York
Oehmcke S, Zielinski O, Kramer O (2016) kNN ensembles with penalized DTW for multivariate time series imputation. In: 2016 international joint conference on neural networks (IJCNN). IEEE, pp 2774–2781
McBride G (2005) A proposal for strength-of-agreement criteria for linsconcordance correlation coefficient. NIWA Client Report: HAM2005-062
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nickolas, S., Shobha, K. (2021). Similarity-Based Data-Fusion Schemes for Missing Data Imputation in Univariate Time Series Data. In: Dave, M., Garg, R., Dua, M., Hussien, J. (eds) Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-7533-4_12
Download citation
DOI: https://doi.org/10.1007/978-981-15-7533-4_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7532-7
Online ISBN: 978-981-15-7533-4
eBook Packages: EngineeringEngineering (R0)