Abstract
There is a need for the development of models that are able to account for discreteness in data, along with its time series properties and correlation. Our focus falls on INteger-valued AutoRegressive (INAR) type models. The INAR type models can be used in conjunction with existing model-based clustering techniques to cluster discrete-valued time series data. With the use of a finite mixture model, several existing techniques such as the selection of the number of clusters, estimation using expectation-maximization and model selection are applicable. The proposed model is then demonstrated on real data to illustrate its clustering applications.
Similar content being viewed by others
References
Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering-a decade review. Inf Syst 53:16–38
Aitken AC (1926) A series formula for the roots of algebraic and transcendental equations. Proc R Soc Edinb 45:14–22
Alonso A, Peña D (2019) Clustering time series by linear dependency. Stat Comput 29(4):655–676
Atkins DC, Baldwin SA, Zheng C, Gallop RJ, Neighbors C (2013) A tutorial on count regression and zero-altered count models for longitudinal substance use data. Psychol Addict Behav J Soc Psychol Addict Behav 27(1):166–177
Berndt D, Clifford J (1994) Using dynamic time war** to find patterns in time series. In: Proceedings of the AAAI-94 workshop knowledge discovery in databases, pp 359–370
Böckenholt U (1998) Mixed INAR (1) poisson regression models: analyzing heterogeneity and serial dependencies in longitudinal count data. J Econ 89(1–2):317–338
Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay B (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46:373–388
Caiado J, Crato N, Peña D (2006) A periodogram-based metric for time series classification. Comput Stat Data Anal 50(10):2668–2684
Caiado J, Maharaj EA, D’Urso P (2015) Time series clustering. In: Hennig C, Meila M, Murtagh F, Rocci R (eds) Handbook of cluster analysis. Chapman & Hall/CRC Press, Boca Raton
da Silva IMM (2005) Contributions to the analysis of discrete-valued time series. PhD thesis, University of Porto
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1):1–38
D’Urso P, De Giovanni L, Massari R (2019) Trimmed fuzzy clustering of financial time series based on dynamic time war**. Annals of operations research, pp 1–17
D’Urso P, Maharaj EA (2009) Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst 160(24):3565–3589
Freeland RK (1998) Statistical analysis of discrete time series with applications to the analysis of workers compensation claims data. PhD thesis, University of British Columbia, Canada
Frühwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Bus Econ Stat 26(1):78–89
Frühwirth-Schnatter S (2011) Panel data analysis: a survey on model-based clustering of time series. Adv Data Anal Classif 5(4):251–280
Frühwirth-Schnatter S, Pamminger C, Winter-Ember R, Weber A (2011) Model-based clustering of categorical time series with multinomial logit classification. AIP Conf Proc 1281(1):1897–1900
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Izakian H, Pedrycz W, Jamal I (2015) Fuzzy clustering of time series data using dynamic time war** distance. Eng Appl Artif Intell 39:235–244
Krishnapuram R, Joshi A, Nasraoui O, Yil L (2001) Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Trans Fuzzy Syst 9(4):595–607
Lindsay BG (1995) Mixture models: theory, geometry and applications. In: NSF-CBMS regional conference series in probability and statistics, vol 5. California: Institute of Mathematical Statistics: Hayward
Maharaj EA, D’Urso P, Caiado J (2019) Time series clustering and classification. Chapman & Hall/CRC Press, Boca Raton
McNicholas PD (2016a) Mixture model-based classification. Chapman & Hall/CRC Press, Boca Raton
McNicholas PD (2016b) Model-based clustering. J Classif 33(3):331–373
McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54(3):711–723
Neighbors C, Lewis MA, Atkins DC, Jensen MM, Walter T, Fossos N, Lee CM, Larimer ME (2010) Efficacy of web-based personalized normative feedback: a two-year randomized controlled trial. J Consult Clin Psychol 78(6):898–911
Pamminger C, Frühwirth-Schnatter S (2010) Model-based clustering of categorical time series. Bayesian Anal 5(2):345–368
R Core Team R: a language and environment for statistical computing
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Sobell MB, Sobell LC, Klajner F, Pavan D, Basian E (1986) The reliability of a timeline method for assessing normal drinker college students’ recent drinking history: utility for alcohol research. Addict Behav 11(2):149–161
Steutel FW, van Harn K (1979) Discrete analogues of self-decomposability and stability. Ann Prob 7:893–899
Weiss CH (2018) An introduction to discrete-valued time series. John Wiley & Sons, Hoboken
Weiß CH (2008) Thinning operations for modeling time series of counts—a survey. AStA Adv Stat Anal 92(2):319–341
**ong Y, Yeung D (2004) Time series clustering with ARMA mixtures. Pattern Recogn 37(8):1675–1689
Acknowledgements
The authors are grateful to anonymous reviewers for their very helpful comments. This work was supported by the Canada Research Chairs program and an E.W.R. Steacie Memorial Fellowship (McNicholas).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Roick, T., Karlis, D. & McNicholas, P.D. Clustering discrete-valued time series. Adv Data Anal Classif 15, 209–229 (2021). https://doi.org/10.1007/s11634-020-00395-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-020-00395-7