Similarity-Based Data-Fusion Schemes for Missing Data Imputation in Univariate Time Series Data

Nickolas, S.; Shobha, K.

doi:10.1007/978-981-15-7533-4_12

S. Nickolas⁸ &
K. Shobha⁸

Part of the book series: Algorithms for Intelligent Systems ((AIS))

622 Accesses

Abstract

Handling missing values in time series data plays a key role in prediction and forecasting, as complete and clean historical data helps in achieving higher accuracy. Numerous research works are present in multivariate time series imputation but imputation in univariate time series data are least considered due to the unavailability of other correlated variables (attributes). However, these algorithms do not perform well when most of the tuples are clustered due to a lack of neighbors during imputation. This paper aims to propose an iterative imputation algorithm by clustering univariate time series data, considering the trend, seasonality, cyclical and residue features of the data. The proposed method uses a similarity-based nearest neighbor imputation approach on each cluster for filling missing values. The proposed method is evaluated on publicly available data set from the Data Market repository and UCI repository by randomly simulating missing patterns throughout the data series. The outcome of the proposed method is evaluated with metrics like MSE, MAE and RMSE and also validated through prediction accuracy and Concordance Correlation Coefficient (CCC) statistical test. Experimental results indicate that the proposed imputation method produces closer values to the original time series data set, resulting in low error rates compared to other existing imputation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 160.49; Price includes VAT (Germany)

Softcover Book: EUR 213.99; Price includes VAT (Germany)

Hardcover Book: EUR 213.99; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Focalize K-NN: an imputation algorithm for time series datasets

Article Open access 07 April 2024

Time Series Missing Value Prediction: Algorithms and Applications

A Survey on Missing Values Handling Methods for Time Series Data

References

Ghil M, Vautard R (1991) Interdecadal oscillations and the warming trend in global temperature time series. Nature 350(6316):324
Article Google Scholar
Billinton R, Chen H, Ghajar R (1996) Time-series models for reliability evaluation of power systems including wind energy. Microelectron Reliab 36(9):1253-1261
Google Scholar
Rubin DB (2004) Multiple imputation for nonresponse in surveys, vol 81. Wiley
Google Scholar
Ford B (1983) An overview of hot-deck procedures: incomplete data in sample surveys 2
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J Roy Stat Soc Ser B (Methodol) 39(1):1–22
MathSciNet MATH Google Scholar
Gautam C, Ravi V (2015) Data imputation via evolutionary computation, clustering and a neural network. Neuro Comput 156:134–142
Google Scholar
Rahman MG (2013) Islam MZ missing value imputation using decision trees and decision forests by splitting and merging records: two novel techniques. Knowl Based Syst 53:51–65
Article Google Scholar
Rahman MG, Islam MZ Fimus (2014) A framework for imputing missing values using co-appearance, correlation and similarity analysis. Knowl Based Syst 56:311–327
Google Scholar
Tutz G, Ramzan S (2015) Improved methods for the imputation of missing data by nearest neighbor methods. Comput Stat Data Anal 90:84–99
Article MathSciNet Google Scholar
Gheyas IA, Smith LS (2010) A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73(16–18):3039–3065
Article Google Scholar
Engels JM, Diehr P (2003) Imputation of missing longitudinal data: a comparison of methods. J Clin Epidemiol 56(10):968–976
Article Google Scholar
Spratt M, Carpenter J, Sterne JA, Carlin JB, Heron J, Henderson J, Tilling K (2010) Strategies for multiple imputation in longitudinal studies. Am J Epidemiol 172(4):478–487
Article Google Scholar
Twisk J, de Vente W (2002) Attrition in longitudinal studies: how to deal with missing data. J Clin Epidemiol 55(4):329–337
Article Google Scholar
Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38(18):2895–2907
Article Google Scholar
Zeileis A, Grothendieck G (2005) Zoo: S3 infrastructure for regular and irregular time series. ar**v preprint math/0505527 https://doi.org/10.18637/jss.v014.i06
Hyndman RJ, Shang HL (2009) Forecasting functional time series. J Korean Stat Soc 38(3):199–211
Article MathSciNet Google Scholar
Sinopoli B, Schenato L, Franceschetti M, Poolla K, Jordan MI, Sastry SS (2004) Kalman filtering with intermittent observations. IEEE Trans Autom Control 49(9):1453–1464
Article MathSciNet Google Scholar
Oba S, Ma S, Takemasa I, Monden M, Ki M, Ishii S (2003) A Bayesian missing value estimation method for gene expression pro le data. Bioinformatics 19(16):2088–2096
Google Scholar
Shen L, Ma Q, Li S (2018) End-to-end time series imputation via residual short paths. In: Asian conference on machine learning, pp 248–263
Google Scholar
Li L, McCann J, Pollard NS, Faloutsos C (2009) Dynammo: mining and summarization of coevolving sequences with missing values. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 507–516
Google Scholar
White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30(4):377–399
Article MathSciNet Google Scholar
Anava O, Hazan E, Zeevi A (2015) Online time series prediction with missing data. In: International conference on machine learning, pp 2191–2199
Google Scholar
Little RJ, Rubin DB (2019) Statistical analysis with missing data, vol 793. Wiley
Google Scholar
Zhu B, He C, Liatsis P (2012) A robust missing value imputation method for noisy data. Appl Intell 36(1):61–74
Article Google Scholar
Little RJ (1988) A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc 83(404):1198–1202
Article MathSciNet Google Scholar
Moritz S, Sardá A, Bartz-Beielstein T, Zaefferer M, Stork J (2015) Comparison of different methods for univariate time series imputation in R. ar**v preprint ar**v:151003924
Luo J, Chen D (2008) An enhanced art2 neural network for clusteringanalysis. In: First international workshop on knowledge discovery and data mining (WKDD 2008). IEEE, pp 81–85
Google Scholar
Carpenter GA, Grossberg S (2017) Adaptive resonance theory. Springer
Google Scholar
García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer
Google Scholar
Friedman J, Hastie T, Tibshirani R (2001) The elements of statisticallearning, vol 1. Springer Series in Statistics, New York
Google Scholar
Oehmcke S, Zielinski O, Kramer O (2016) kNN ensembles with penalized DTW for multivariate time series imputation. In: 2016 international joint conference on neural networks (IJCNN). IEEE, pp 2774–2781
Google Scholar
McBride G (2005) A proposal for strength-of-agreement criteria for linsconcordance correlation coefficient. NIWA Client Report: HAM2005-062
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu, 620015, India
S. Nickolas & K. Shobha

Authors

S. Nickolas
View author publications
You can also search for this author in PubMed Google Scholar
K. Shobha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Nickolas .

Editor information

Editors and Affiliations

Department of Computer Engineering, National Institute of Technology, Kurukshetra, Kurukshetra, India
Mayank Dave
Department of Computer Engineering, National Institute of Technology, Kurukshetra, Kurukshetra, India
Ritu Garg
Department of Computer Engineering, National Institute of Technology, Kurukshetra, Kurukshetra, India
Mohit Dua
School of Information Technology, Deakin University, Geelong, VIC, Australia
Jemal Hussien

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nickolas, S., Shobha, K. (2021). Similarity-Based Data-Fusion Schemes for Missing Data Imputation in Univariate Time Series Data. In: Dave, M., Garg, R., Dua, M., Hussien, J. (eds) Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-7533-4_12

Download citation

DOI: https://doi.org/10.1007/978-981-15-7533-4_12
Published: 20 February 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7532-7
Online ISBN: 978-981-15-7533-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Similarity-Based Data-Fusion Schemes for Missing Data Imputation in Univariate Time Series Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Focalize K-NN: an imputation algorithm for time series datasets

Time Series Missing Value Prediction: Algorithms and Applications

A Survey on Missing Values Handling Methods for Time Series Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Similarity-Based Data-Fusion Schemes for Missing Data Imputation in Univariate Time Series Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Focalize K-NN: an imputation algorithm for time series datasets

Time Series Missing Value Prediction: Algorithms and Applications

A Survey on Missing Values Handling Methods for Time Series Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation