Fast DD-classification of functional data

Mosler, Karl; Mozharovskyi, Pavlo

doi:10.1007/s00362-015-0738-3

Fast DD-classification of functional data

Regular Article
Published: 21 December 2015

Volume 58, pages 1055–1089, (2017)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Karl Mosler¹ &
Pavlo Mozharovskyi¹

734 Accesses
16 Citations
Explore all metrics

Abstract

A fast nonparametric procedure for classifying functional data is introduced. It consists of a two-step transformation of the original data plus a classifier operating on a low-dimensional space. The functional data are first mapped into a finite-dimensional location-slope space and then transformed by a multivariate depth function into the DD-plot, which is a subset of the unit square. This transformation yields a new notion of depth for functional data. Three alternative depth functions are employed for this, as well as two rules for the final classification in $[0,1]^2$. The resulting classifier has to be cross-validated over a small range of parameters only, which is restricted by a Vapnik–Chervonenkis bound. The entire methodology does not involve smoothing techniques, is completely nonparametric and allows to achieve Bayes optimality under standard distributional settings. It is robust, efficiently computable, and has been implemented in an R environment. Applicability of the new approach is demonstrated by simulations as well as by a benchmark study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

The $$\hbox {DD}^G$$ -classifier in the functional setting

Article 06 August 2016

Multivariate and functional classification using depth and distance

Article 17 August 2016

Classifying real-world data with the $${ DD}\alpha $$ -procedure

Article 31 July 2014

References

Baíllo A, Cuevas A (2008) Supervised functional classification: a theoretical remark and some comparisons. ar**v:0806.2831v1 [stat.ML]
Biau G, Bunea F, Wegkamp MH (2005) Functional classification in Hilbert spaces. IEEE Trans Inf Theory 51:2163–2172
Article MATH MathSciNet Google Scholar
Cambanis S (1973) On some continuity and differentiability properties of paths of Gaussian processes. J Multivar Anal 3:420–434
Article MATH MathSciNet Google Scholar
Carey JR, Liedo P, Müller H-G, Wang J-L, Chiou J-M (1998) Relationship of age patterns of fecundity to mortality, longevity, and lifetime reproduction in a large cohort of Mediterranean fruit fly females. J Gerontol 53A:B245–B251
Article Google Scholar
Chakraborty A, Chaudhuri P (2014) On data depth in infinite dimensional spaces. Ann Inst Stat Math 66:303–324
Article MATH MathSciNet Google Scholar
Cuesta-Albertos JA, Febrero-Bande M, Oviedo de la Fuente M (2015) The DD$^G$-classifier in the functional setting. ar**v:1501.00372 [stat.ME]
Cuesta-Albertos JA, Nieto-Reyes A (2008) The random Tukey depth. Comput Stat Data Anal 52:4979–4988
Article MATH MathSciNet Google Scholar
Cuesta-Albertos JA, Nieto-Reyes A (2010) Functional classification and the random Tukey depth. Practical issues. In: Borgelt C, Rodríguez GG, Trutschnig W, Lubiano MA, Angeles Gil M, Grzegorzewski P, Hryniewicz O (eds) Combining soft computing and statistical methods in data analysis. Springer, Berlin/Heidelberg, pp 123–130
Cuevas A, Febrero M, Fraiman R (2007) Robust estimation and classification for functional data via projection-based depth notions. Comput Stat 22:481–496
Article MATH MathSciNet Google Scholar
Delaigle A, Hall P (2012) Achieving near-perfect classification for functional data. J R Stat Soc 74:267–286
Article MathSciNet Google Scholar
Delaigle A, Hall P, Bathia N (2012) Componentwise classification and clustering of functional data. Biometrika 99:299–313
Article MATH MathSciNet Google Scholar
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New York
Book MATH Google Scholar
Dutta S, Ghosh AK (2012a) On robust classification using projection depth. Ann Inst Stat Math 64:657–676
Article MATH MathSciNet Google Scholar
Dutta S, Ghosh AK (2012b) On classification based on $L_p$ depth with an adaptive choice of $p$. Technical Report Number R5/2011, Statistics and Mathematics Unit. Indian Statistical Institute, Kolkata
Ferraty F, Hall P, Vieu P (2010) Most-predictive design points for functional data predictors. Biometrika 94:807–824
Article MATH MathSciNet Google Scholar
Ferraty F, Vieu P (2003) Curves discrimination: a nonparametric functional approach. Comput Stat Data Anal 44:161–173
Article MATH MathSciNet Google Scholar
Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer, New York
MATH Google Scholar
Ferré L, Villa N (2006) Multi-layer perceptron with functional inputs: an inverse regression approach. Scand J Stat 33:807–823
Article MATH Google Scholar
Fraiman R, Muniz G (2001) Trimmed means for functional data. TEST 10:419–440
Article MATH MathSciNet Google Scholar
Ghosh AK, Chaudhuri P (2005) On maximum depth and related classifiers. Scand J Stat 32:327–350
Article MATH MathSciNet Google Scholar
Hall P, Poskitt D, Presnell B (2001) A functional data-analytic approach to signal discrimination. Technometrics 43:1–9
Article MATH MathSciNet Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random varibles. J Am Stat Assoc 58:13–30
Article MATH Google Scholar
Huang D-S, Zheng C-H (2006) Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22:1855–1862
Article Google Scholar
James G, Hastie T (2001) Functional linear discriminant analysis for irregularly sampled curves. J R Stat Soc Ser B 63:533–550
Article MATH MathSciNet Google Scholar
Kuelbs J, Zinn J (2013) Concerns with functional depth. Lat Am J Probab Math Stat 10:831–855
MATH MathSciNet Google Scholar
Lange T, Mosler K, Mozharovskyi P (2014a) Fast nonparametric classification based on data depth. Stat Pap 55:49–69
Article MATH MathSciNet Google Scholar
Lange T, Mosler K, Mozharovskyi P (2014b). $DD\alpha $-classification of asymmetric and fat-tailed data. In: Spiliopoulou M, Schmidt-Thieme L, Janning R (eds) Data analysis, machine learning and knowledge discovery. Springer, Berlin, pp 71–78
Leng XY, Müller H-G (2006) Classification using functional data analysis for temporal gene expression data. Bioinformatics 22:68–76
Article Google Scholar
Li J, Cuesta-Albertos JA, Liu RY (2012) $DD$-classifier: nonparametric classification procedure based on $DD$-plot. J Am Stat Assoc 107:737–753
Article MATH MathSciNet Google Scholar
Liu X, Zuo Y (2014) Computing projection depth and its associated estimators. Stat Comput 24:51–63
Article MATH MathSciNet Google Scholar
López-Pintado S, Romo J (2006) Depth-based classification for functional data. In: Liu R, Serfling R, Souvaine D (eds) Data depth: robust multivariate analysis. American Mathematical Society, Computational Geometry and Applications, pp 103–120
Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Acad India 12:49–55
MATH Google Scholar
Mosler K, Polyakova Y (2012) General notions of depth for functional data. ar**v:1208.1981v1 [stat.ME]
Mozharovskyi P, Mosler K, Lange T (2015) Classifying real-world data with the $DD\alpha $-procedure. Adv Data Anal Classif 9:287–314
Article MathSciNet Google Scholar
Müller H-G, Stadtmüller U (2005) Generalized functional linear models. Ann Stat 33:774–805
Article MATH MathSciNet Google Scholar
Nagy S, Gijbels I, Hlubinka D (2015) Weak convergence of discretely observed functional data with applications. J Multivar Anal. doi:10.1016/j.jmva.2015.06.006
Ramsay JO, Silverman BW (2005) Functional data analysis. Springer series in statistics, 2nd edn. Springer, Berlin
Google Scholar
Rossi F, Villa N (2006) Support vector machine for functional data classification. Neurocomputing 69:730–742
Article Google Scholar
Serfling R (2002) A depth function and a scale curve based on spatial quantiles. In: Y Dodge (ed) Statistics and data analysis based on L$_1$-norm and related methods. Birkhaeuser, pp 25–38
Sguera C, Galeano P, Lillo RE (2014) Spatial depth-based classification for functional data. TEST 23:725–750
Article MATH MathSciNet Google Scholar
Tian ST, James G (2013) Interpretable dimensionality reduction for classifying functional data. Comput Stat Data Anal 57:282–296
Article MATH Google Scholar
Tuddenham R, Snyder M (1954) Physical growth of California boys and girls from birth to eighteen years. University of California Press, Berkeley
Google Scholar
Vapnik VN, Ya Chervonenkis A (1974) Teorija raspoznavanija obrazov (statisticheskie problemy obuchenija) (The theory of pattern recognition (statistical learning problems), in Russian). Nauka, Moscow
Google Scholar
Vardi Y, Zhang CH (2000) The multivariate $L_1$-median and associated data depth. Proc Natl Acad Sci USA 97:1423–1426
Vasil’ev VI, Lange T (1998) The duality principle in learning for pattern recognition (in Russian). Kibern i Vytschislit’elnaya Tech 121:7–16
Google Scholar
Vencálek (2011) Weighted data depth and depth based discrimination. Doctoral thesis. Charles University, Prague
Wang XH, Ray S, Mallick BK (2007) Bayesian curve classification using wavelets. J Am Stat Assoc 102:962–973
Article MATH MathSciNet Google Scholar
Zuo YJ, Serfling R (2000) General notions of statistical depth function. Ann Stat 28:461–482
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

We thank Dominik Liebl for his critical comments on an earlier version of the manuscript, as well as Ondrej Vencalek and Aurore Delaigle for their helpful remarks. The reading and suggestions of two referees are also gratefully acknowledged.

Author information

Authors and Affiliations

University of Cologne, Albertus-Magnus-Platz, 50923, Cologne, Germany
Karl Mosler & Pavlo Mozharovskyi

Authors

Karl Mosler
View author publications
You can also search for this author in PubMed Google Scholar
Pavlo Mozharovskyi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pavlo Mozharovskyi.

Appendices

Appendix 1: Implementation details

In calculating the depths, $\mu _Y$ and $\Sigma _Y$ for the Mahalanobis depth have been determined by the usual moment estimates and similarly, $\Sigma _Y$ for the spatial depth. The projection depth has been approximated by drawing 1000 directions from the uniform distribution on the unit sphere. Clearly, the number of directions needed for satisfactory approximation depends on the dimension of the space. Observe that for higher-dimensional problems 1000 directions are not enough, which becomes apparent from the analysis of Model 2 in Sect. 7.2. There the location-slope spaces chosen have dimension eight and higher; see also Tables 4 and 8 in Appendix 2. On the other hand, calculating the projection depth even in one dimension costs something. Computing 1 000 directions to approximate the projection depth takes substantially more time than computing the exact Mahalanobis or spatial depths (see Tables 2 and 14 in Appendix 2).

LDA and QDA are used with classical moment estimates, and priors are estimated by the class portions in the training set. The kNN-classifier is applied to location-slope data in its affine invariant form, based on the covariance matrix of the pooled classes. For time reasons, its parameter k is determined by leave-one-out cross-validation over a reduced range, viz. $k\in \{1, \dots , \max \{\min \{10(m+n)^{1/d}+1,m+n-1\},2\}\}$. The $\alpha $-procedure separating the DD-plot uses polynomial space extensions with maximum degree three; the latter is selected by cross-validation. To keep the training speed of the depth-based kNN-classifier comparable with that of the $DD\alpha $-classifier, we also determine k by leave-one-out cross-validation on a reduced range of $k\in \{1, \dots , \max \{\min \{10\sqrt{m+n}+1,(m+n)/2\},2\}\}$.

Due to linear interpolation, the levels are integrated as piecewise-linear functions, and the derivatives as piecewise constant ones. If the dimension of the location-slope space is too large (in particular for inverting the covariance matrix, as it can be the case in Model 2), PCA is used to reduce the dimension. Then $\epsilon _{max}$ is estimated and all further computations are performed in the subspace of principal components having positive loadings.

To construct the location-slope space, firstly all pairs (L, S) satisfying $2\le L+S\le M/2$ are considered. (M / 2 amounts to 26 for the synthetic and to 16 for the real data sets.) For each (L, S) the data are transformed to $\mathbb {R}^{L+S}$, and the Vapnik–Chervonenkis bound $\epsilon _{max}$ is calculated. Then those five pairs are selected that have smallest $\epsilon _{max}$. Here, tied values of $\epsilon _{max}$ are taken into account as well, with the consequence that on an average slightly more than five pairs are selected; see the growth data in Table 2 and both synthetic models in Table 14 of Appendix 2. Finally, among these the best (L, S)-pair is chosen by means of cross-validation. Note that the goal of this cross-validation is not to actually choose a best location-slope dimension but rather to get rid of obviously misleading (L, S)-pairs, which may yield relatively small values of $\epsilon _{max}$. This is seen from Figs. 4 and 5. When determining an optimal (L, S)-pair by crossLS, the same set of (L, S)-pairs is considered as with VCcrossLS.

In implementing the componentwise method of finite-dimensional space synthesis (crossDHB) we have followed Delaigle et al. (2012) with slight modifications. The original approach of Delaigle et al. (2012) is combined with the sequential approach of Ferraty et al. (2010). Initially, a grid of equally ($\Delta t$) distanced discretization points is built. Then a sequence of finite-dimensional spaces is synthesized by adding points of the grid step by step. We start with all pairs of discretization points that have at least distance $2\Delta t$. [Note that Delaigle et al. (2012) start with single points instead of pairs.] The best of them is chosen by cross-validation. Then step by step features are added. In each step, that point that has best discrimination power (again, in the sense of cross-validation) when added to the already constructed set is chosen as a new feature. The resulting set of points is used to construct a neighborhood of combinations to be further considered. As a neighborhood we use twenty $2\Delta t$-distanced points in the second step, and ten in the third; from the fourth step on the sequential approach is applied only.

All our cross-validations are tenfold, except the leave-one-out cross-validations in determining k with both kNN-classifiers. Of course, partitioning the sample into ten parts only may depreciate our approach against a more comprehensive leave-one-out cross-validation. We have chosen it to keep computation times of the crossDHB approach (Delaigle et al. 2012) in practical limits and also to make the comparison of approaches equitable throughout our study.

The calculations have been implemented in an R-environment, based on the R-package “ddalpha” (Mozharovskyi et al. 2015), with speed critical parts written in C++. The R-code implementing our methodology as well as that performing the experiments can be obtained upon request from the authors. In all experiments, one kernel of the processor Core i7-2600 (3.4 GHz) having enough physical memory has been used. Thus, regarding the methodology of Delaigle et al. (2012) our implementation differs from their original one and, due to its module-based structure, may result in larger computation times. For this reason we provide the number of cross-validations performed; see Tables 2 and 14 of Appendix 2. The comparison appears to be fair, as we always use ten-fold cross-validation together with an identical set of classification rules in the finite-dimensional spaces.

Appendix 2: Additional tables

See Tables 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, and 14

Table 3 Frequency (in %) of selected location-slope dimensions using the Vapnik–Chervonenkis bound; Model 1

Full size table

Table 4 Frequency (in %) of selected location-slope dimensions using the Vapnik–Chervonenkis bound; Model 2

Full size table

Table 5 Frequency (in %) of location-slope dimensions chosen using the Vapnik–Chervonenkis bound; growth data

Full size table

Table 6 Frequency (in %) of location-slope dimensions chosen using the Vapnik–Chervonenkis bound; medflies data

Full size table

Table 7 Frequency (in %) of selected location-slope dimensions using cross-validation; Model 1

Full size table

Table 8 Frequency (in %) of selected location-slope dimensions using cross-validation; Model 2

Full size table

Table 9 Frequency (in %) of location-slope dimensions chosen using cross-validation; growth data

Full size table

Table 10 Frequency (in %) of location-slope dimensions chosen using cross-validation; medflies data

Full size table

Table 11 Frequency (in %) of selected dimensions using componentwise method; Model 1

Full size table

Table 12 Frequency (in %) of selected dimensions using componentwise method; Model 2

Full size table

Table 13 Frequency (in %) of selected dimensions using componentwise method; growth data

Full size table

Table 14 Average (median for componentwise classification = crossDHB) training and classification (in parentheses) times (in s), and numbers of cross-validations performed (in square brackets), over 100 tries

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mosler, K., Mozharovskyi, P. Fast DD-classification of functional data. Stat Papers 58, 1055–1089 (2017). https://doi.org/10.1007/s00362-015-0738-3

Download citation

Received: 16 December 2014
Revised: 28 September 2015
Published: 21 December 2015
Issue Date: December 2017
DOI: https://doi.org/10.1007/s00362-015-0738-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Fast DD-classification of functional data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The $$\hbox {DD}^G$$ -classifier in the functional setting

Multivariate and functional classification using depth and distance

Classifying real-world data with the $${ DD}\alpha $$ -procedure

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Implementation details

Appendix 2: Additional tables

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Fast DD-classification of functional data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The $$\hbox {DD}^G$$ -classifier in the functional setting

Multivariate and functional classification using depth and distance

Classifying real-world data with the $${ DD}\alpha $$ -procedure

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Implementation details

Appendix 2: Additional tables

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation