Log in

Robust clustering for functional data based on trimming and constraints

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Many clustering algorithms when the data are curves or functions have been recently proposed. However, the presence of contamination in the sample of curves can influence the performance of most of them. In this work we propose a robust, model-based clustering method that relies on an approximation to the “density function” for functional data. The robustness follows from the joint application of data-driven trimming, for reducing the effect of contaminated observations, and constraints on the variances, for avoiding spurious clusters in the solution. The algorithm is designed to perform clustering and outlier detection simultaneously by maximizing a trimmed “pseudo” likelihood. The proposed method has been evaluated and compared with other existing methods through a simulation study. Better performance for the proposed methodology is shown when a fraction of contaminating curves is added to a non-contaminated sample. Finally, an application to a real data set that has been previously considered in the literature is given.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5(4):281–300

    Article  MathSciNet  MATH  Google Scholar 

  • Bouveyron C, Jacques J (2014) funHDDC: model-based clustering in group-specific functional subspaces. R package version 1.0

  • Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 2:245–276

    Article  Google Scholar 

  • Cerioli A, García-Escudero LA, Mayo-Iscar A, Riani M (2017) Finding the number of normal groups in model-based clustering via constrained likelihoods. J Comput Graph Stat

  • Cuesta-Albertos JA, Fraiman R (2007) Impartial trimmed \(k\)-means for functional data. Comput Stat Data Anal 51(10):4864–4877

    Article  MathSciNet  MATH  Google Scholar 

  • Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Ann Stat 25(2):553–576

    Article  MathSciNet  MATH  Google Scholar 

  • Delaigle A, Hall P (2010) Defining probability density for a distribution of random functions. Ann Stat 38(2):1171–1193

    Article  MathSciNet  MATH  Google Scholar 

  • Febrero M, Galeano P, González-Manteiga W (2008) Outlier detection in functional data by depth measures, with application to identify abnormal \({\rm NO}x\) levels. Environmetrics 19(4):331–345

    Article  MathSciNet  Google Scholar 

  • Febrero-Bande M, de la Fuente M Oviedo (2012) Statistical computing in functional data analysis: the R package fda.usc. J Stat Softw 51(4):1–28

    Article  Google Scholar 

  • Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer series in statistics. Springer, New York

    MATH  Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631

    Article  MathSciNet  MATH  Google Scholar 

  • Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136

    Article  MathSciNet  MATH  Google Scholar 

  • Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Classification, clustering, and data analysis (Cracow, 2002). Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp. 247–255

  • García-Escudero LA, Gordaliza A (2005) A proposal for robust curve clustering. J Classif 22(2):185–201

    Article  MathSciNet  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3):1324–1345

    Article  MathSciNet  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2015) Avoiding spurious local maximizers in mixture modeling. Stat Comput 25(3):619–633

    Article  MathSciNet  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Mayo-Iscar A (2014) A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv Data Anal Classif 8(1):27–43

    Article  MathSciNet  Google Scholar 

  • Jacques J, Preda C (2013) Funclust: a curves clustering method using functional random variables density approximation. Neurocomputing 112:164–171

    Article  Google Scholar 

  • James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408

    Article  MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics, New York

    Book  MATH  Google Scholar 

  • Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer series in statistics. Springer, New York

    MATH  Google Scholar 

  • Ramsay JO, Wickham H, Graves S, Hooker G (2014) fda: functional data analysis. R package version 2.4.4

  • Ritter G (2015) Robust cluster analysis and variable selection, vol 137. Monographs on statistics and applied probability. CRC Press, Boca Raton, FL

    MATH  Google Scholar 

  • Sawant P, Billor N, Shin H (2012) Functional outlier detection with robust functional principal component analysis. Comput Stat 27(1):83–102

    Article  MathSciNet  MATH  Google Scholar 

  • Sguera C, Galeano P, Lillo RE (2015) Functional outlier detection by a local depth with application to NOx levels. Stoch Environ Res Risk Assess 462:1835–1851

    Google Scholar 

  • Soueidatt M (2014) Funclustering: a package for functional data clustering. R package version 1.0.1

Download references

Acknowledgements

We would like to thank the Associate Editor and two anonymous reviewers for their helpful suggestions and comments. This work was partly done while DR and JO visited the Departamento de Estadística e I.O., Universidad de Valladolid, Spain, with support from Conacyt, Mexico (DR as visiting graduate student, JO by Projects 169175 Análisis Estadístico de Olas Marinas, Fase II y 234057 Análisis Espectral, Datos Funcionales y Aplicaciones), CIMAT, A.C. and the Universidad de Valladolid. Their hospitality and support is gratefully acknowledged. Research by LA G-E and A M-I was partially supported by the Spanish Ministerio de Economía y Competitividad, grant MTM2017-86061-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León and FEDER, grant VA005P17.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diego Rivera-García.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rivera-García, D., García-Escudero, L.A., Mayo-Iscar, A. et al. Robust clustering for functional data based on trimming and constraints. Adv Data Anal Classif 13, 201–225 (2019). https://doi.org/10.1007/s11634-018-0312-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-018-0312-7

Keywords

Mathematics Subject Classification

Navigation