Similarity-Based Data-Fusion Schemes for Missing Data Imputation in Univariate Time Series Data

  • Conference paper
  • First Online:
Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences

Part of the book series: Algorithms for Intelligent Systems ((AIS))

  • 622 Accesses

Abstract

Handling missing values in time series data plays a key role in prediction and forecasting, as complete and clean historical data helps in achieving higher accuracy. Numerous research works are present in multivariate time series imputation but imputation in univariate time series data are least considered due to the unavailability of other correlated variables (attributes). However, these algorithms do not perform well when most of the tuples are clustered due to a lack of neighbors during imputation. This paper aims to propose an iterative imputation algorithm by clustering univariate time series data, considering the trend, seasonality, cyclical and residue features of the data. The proposed method uses a similarity-based nearest neighbor imputation approach on each cluster for filling missing values. The proposed method is evaluated on publicly available data set from the Data Market repository and UCI repository by randomly simulating missing patterns throughout the data series. The outcome of the proposed method is evaluated with metrics like MSE, MAE and RMSE and also validated through prediction accuracy and Concordance Correlation Coefficient (CCC) statistical test. Experimental results indicate that the proposed imputation method produces closer values to the original time series data set, resulting in low error rates compared to other existing imputation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 160.49
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 213.99
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 213.99
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ghil M, Vautard R (1991) Interdecadal oscillations and the warming trend in global temperature time series. Nature 350(6316):324

    Article  Google Scholar 

  2. Billinton R, Chen H, Ghajar R (1996) Time-series models for reliability evaluation of power systems including wind energy. Microelectron Reliab 36(9):1253-1261

    Google Scholar 

  3. Rubin DB (2004) Multiple imputation for nonresponse in surveys, vol 81. Wiley

    Google Scholar 

  4. Ford B (1983) An overview of hot-deck procedures: incomplete data in sample surveys 2

    Google Scholar 

  5. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J Roy Stat Soc Ser B (Methodol) 39(1):1–22

    MathSciNet  MATH  Google Scholar 

  6. Gautam C, Ravi V (2015) Data imputation via evolutionary computation, clustering and a neural network. Neuro Comput 156:134–142

    Google Scholar 

  7. Rahman MG (2013) Islam MZ missing value imputation using decision trees and decision forests by splitting and merging records: two novel techniques. Knowl Based Syst 53:51–65

    Article  Google Scholar 

  8. Rahman MG, Islam MZ Fimus (2014) A framework for imputing missing values using co-appearance, correlation and similarity analysis. Knowl Based Syst 56:311–327

    Google Scholar 

  9. Tutz G, Ramzan S (2015) Improved methods for the imputation of missing data by nearest neighbor methods. Comput Stat Data Anal 90:84–99

    Article  MathSciNet  Google Scholar 

  10. Gheyas IA, Smith LS (2010) A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73(16–18):3039–3065

    Article  Google Scholar 

  11. Engels JM, Diehr P (2003) Imputation of missing longitudinal data: a comparison of methods. J Clin Epidemiol 56(10):968–976

    Article  Google Scholar 

  12. Spratt M, Carpenter J, Sterne JA, Carlin JB, Heron J, Henderson J, Tilling K (2010) Strategies for multiple imputation in longitudinal studies. Am J Epidemiol 172(4):478–487

    Article  Google Scholar 

  13. Twisk J, de Vente W (2002) Attrition in longitudinal studies: how to deal with missing data. J Clin Epidemiol 55(4):329–337

    Article  Google Scholar 

  14. Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38(18):2895–2907

    Article  Google Scholar 

  15. Zeileis A, Grothendieck G (2005) Zoo: S3 infrastructure for regular and irregular time series. ar**v preprint math/0505527 https://doi.org/10.18637/jss.v014.i06

  16. Hyndman RJ, Shang HL (2009) Forecasting functional time series. J Korean Stat Soc 38(3):199–211

    Article  MathSciNet  Google Scholar 

  17. Sinopoli B, Schenato L, Franceschetti M, Poolla K, Jordan MI, Sastry SS (2004) Kalman filtering with intermittent observations. IEEE Trans Autom Control 49(9):1453–1464

    Article  MathSciNet  Google Scholar 

  18. Oba S, Ma S, Takemasa I, Monden M, Ki M, Ishii S (2003) A Bayesian missing value estimation method for gene expression pro le data. Bioinformatics 19(16):2088–2096

    Google Scholar 

  19. Shen L, Ma Q, Li S (2018) End-to-end time series imputation via residual short paths. In: Asian conference on machine learning, pp 248–263

    Google Scholar 

  20. Li L, McCann J, Pollard NS, Faloutsos C (2009) Dynammo: mining and summarization of coevolving sequences with missing values. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 507–516

    Google Scholar 

  21. White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30(4):377–399

    Article  MathSciNet  Google Scholar 

  22. Anava O, Hazan E, Zeevi A (2015) Online time series prediction with missing data. In: International conference on machine learning, pp 2191–2199

    Google Scholar 

  23. Little RJ, Rubin DB (2019) Statistical analysis with missing data, vol 793. Wiley

    Google Scholar 

  24. Zhu B, He C, Liatsis P (2012) A robust missing value imputation method for noisy data. Appl Intell 36(1):61–74

    Article  Google Scholar 

  25. Little RJ (1988) A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc 83(404):1198–1202

    Article  MathSciNet  Google Scholar 

  26. Moritz S, Sardá A, Bartz-Beielstein T, Zaefferer M, Stork J (2015) Comparison of different methods for univariate time series imputation in R. ar**v preprint ar**v:151003924

  27. Luo J, Chen D (2008) An enhanced art2 neural network for clusteringanalysis. In: First international workshop on knowledge discovery and data mining (WKDD 2008). IEEE, pp 81–85

    Google Scholar 

  28. Carpenter GA, Grossberg S (2017) Adaptive resonance theory. Springer

    Google Scholar 

  29. García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer

    Google Scholar 

  30. Friedman J, Hastie T, Tibshirani R (2001) The elements of statisticallearning, vol 1. Springer Series in Statistics, New York

    Google Scholar 

  31. Oehmcke S, Zielinski O, Kramer O (2016) kNN ensembles with penalized DTW for multivariate time series imputation. In: 2016 international joint conference on neural networks (IJCNN). IEEE, pp 2774–2781

    Google Scholar 

  32. McBride G (2005) A proposal for strength-of-agreement criteria for linsconcordance correlation coefficient. NIWA Client Report: HAM2005-062

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Nickolas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nickolas, S., Shobha, K. (2021). Similarity-Based Data-Fusion Schemes for Missing Data Imputation in Univariate Time Series Data. In: Dave, M., Garg, R., Dua, M., Hussien, J. (eds) Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-7533-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-7533-4_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-7532-7

  • Online ISBN: 978-981-15-7533-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation