Abstract
Many clustering algorithms when the data are curves or functions have been recently proposed. However, the presence of contamination in the sample of curves can influence the performance of most of them. In this work we propose a robust, model-based clustering method that relies on an approximation to the “density function” for functional data. The robustness follows from the joint application of data-driven trimming, for reducing the effect of contaminated observations, and constraints on the variances, for avoiding spurious clusters in the solution. The algorithm is designed to perform clustering and outlier detection simultaneously by maximizing a trimmed “pseudo” likelihood. The proposed method has been evaluated and compared with other existing methods through a simulation study. Better performance for the proposed methodology is shown when a fraction of contaminating curves is added to a non-contaminated sample. Finally, an application to a real data set that has been previously considered in the literature is given.
Similar content being viewed by others
References
Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5(4):281–300
Bouveyron C, Jacques J (2014) funHDDC: model-based clustering in group-specific functional subspaces. R package version 1.0
Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 2:245–276
Cerioli A, García-Escudero LA, Mayo-Iscar A, Riani M (2017) Finding the number of normal groups in model-based clustering via constrained likelihoods. J Comput Graph Stat
Cuesta-Albertos JA, Fraiman R (2007) Impartial trimmed \(k\)-means for functional data. Comput Stat Data Anal 51(10):4864–4877
Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Ann Stat 25(2):553–576
Delaigle A, Hall P (2010) Defining probability density for a distribution of random functions. Ann Stat 38(2):1171–1193
Febrero M, Galeano P, González-Manteiga W (2008) Outlier detection in functional data by depth measures, with application to identify abnormal \({\rm NO}x\) levels. Environmetrics 19(4):331–345
Febrero-Bande M, de la Fuente M Oviedo (2012) Statistical computing in functional data analysis: the R package fda.usc. J Stat Softw 51(4):1–28
Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer series in statistics. Springer, New York
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136
Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Classification, clustering, and data analysis (Cracow, 2002). Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp. 247–255
García-Escudero LA, Gordaliza A (2005) A proposal for robust curve clustering. J Classif 22(2):185–201
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3):1324–1345
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2015) Avoiding spurious local maximizers in mixture modeling. Stat Comput 25(3):619–633
García-Escudero LA, Gordaliza A, Mayo-Iscar A (2014) A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv Data Anal Classif 8(1):27–43
Jacques J, Preda C (2013) Funclust: a curves clustering method using functional random variables density approximation. Neurocomputing 112:164–171
James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics, New York
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer series in statistics. Springer, New York
Ramsay JO, Wickham H, Graves S, Hooker G (2014) fda: functional data analysis. R package version 2.4.4
Ritter G (2015) Robust cluster analysis and variable selection, vol 137. Monographs on statistics and applied probability. CRC Press, Boca Raton, FL
Sawant P, Billor N, Shin H (2012) Functional outlier detection with robust functional principal component analysis. Comput Stat 27(1):83–102
Sguera C, Galeano P, Lillo RE (2015) Functional outlier detection by a local depth with application to NOx levels. Stoch Environ Res Risk Assess 462:1835–1851
Soueidatt M (2014) Funclustering: a package for functional data clustering. R package version 1.0.1
Acknowledgements
We would like to thank the Associate Editor and two anonymous reviewers for their helpful suggestions and comments. This work was partly done while DR and JO visited the Departamento de Estadística e I.O., Universidad de Valladolid, Spain, with support from Conacyt, Mexico (DR as visiting graduate student, JO by Projects 169175 Análisis Estadístico de Olas Marinas, Fase II y 234057 Análisis Espectral, Datos Funcionales y Aplicaciones), CIMAT, A.C. and the Universidad de Valladolid. Their hospitality and support is gratefully acknowledged. Research by LA G-E and A M-I was partially supported by the Spanish Ministerio de Economía y Competitividad, grant MTM2017-86061-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León and FEDER, grant VA005P17.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rivera-García, D., García-Escudero, L.A., Mayo-Iscar, A. et al. Robust clustering for functional data based on trimming and constraints. Adv Data Anal Classif 13, 201–225 (2019). https://doi.org/10.1007/s11634-018-0312-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-018-0312-7