Abstract
A fast nonparametric procedure for classifying functional data is introduced. It consists of a two-step transformation of the original data plus a classifier operating on a low-dimensional space. The functional data are first mapped into a finite-dimensional location-slope space and then transformed by a multivariate depth function into the DD-plot, which is a subset of the unit square. This transformation yields a new notion of depth for functional data. Three alternative depth functions are employed for this, as well as two rules for the final classification in \([0,1]^2\). The resulting classifier has to be cross-validated over a small range of parameters only, which is restricted by a Vapnik–Chervonenkis bound. The entire methodology does not involve smoothing techniques, is completely nonparametric and allows to achieve Bayes optimality under standard distributional settings. It is robust, efficiently computable, and has been implemented in an R environment. Applicability of the new approach is demonstrated by simulations as well as by a benchmark study.
Similar content being viewed by others
References
Baíllo A, Cuevas A (2008) Supervised functional classification: a theoretical remark and some comparisons. ar**v:0806.2831v1 [stat.ML]
Biau G, Bunea F, Wegkamp MH (2005) Functional classification in Hilbert spaces. IEEE Trans Inf Theory 51:2163–2172
Cambanis S (1973) On some continuity and differentiability properties of paths of Gaussian processes. J Multivar Anal 3:420–434
Carey JR, Liedo P, Müller H-G, Wang J-L, Chiou J-M (1998) Relationship of age patterns of fecundity to mortality, longevity, and lifetime reproduction in a large cohort of Mediterranean fruit fly females. J Gerontol 53A:B245–B251
Chakraborty A, Chaudhuri P (2014) On data depth in infinite dimensional spaces. Ann Inst Stat Math 66:303–324
Cuesta-Albertos JA, Febrero-Bande M, Oviedo de la Fuente M (2015) The DD\(^G\)-classifier in the functional setting. ar**v:1501.00372 [stat.ME]
Cuesta-Albertos JA, Nieto-Reyes A (2008) The random Tukey depth. Comput Stat Data Anal 52:4979–4988
Cuesta-Albertos JA, Nieto-Reyes A (2010) Functional classification and the random Tukey depth. Practical issues. In: Borgelt C, Rodríguez GG, Trutschnig W, Lubiano MA, Angeles Gil M, Grzegorzewski P, Hryniewicz O (eds) Combining soft computing and statistical methods in data analysis. Springer, Berlin/Heidelberg, pp 123–130
Cuevas A, Febrero M, Fraiman R (2007) Robust estimation and classification for functional data via projection-based depth notions. Comput Stat 22:481–496
Delaigle A, Hall P (2012) Achieving near-perfect classification for functional data. J R Stat Soc 74:267–286
Delaigle A, Hall P, Bathia N (2012) Componentwise classification and clustering of functional data. Biometrika 99:299–313
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New York
Dutta S, Ghosh AK (2012a) On robust classification using projection depth. Ann Inst Stat Math 64:657–676
Dutta S, Ghosh AK (2012b) On classification based on \(L_p\) depth with an adaptive choice of \(p\). Technical Report Number R5/2011, Statistics and Mathematics Unit. Indian Statistical Institute, Kolkata
Ferraty F, Hall P, Vieu P (2010) Most-predictive design points for functional data predictors. Biometrika 94:807–824
Ferraty F, Vieu P (2003) Curves discrimination: a nonparametric functional approach. Comput Stat Data Anal 44:161–173
Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer, New York
Ferré L, Villa N (2006) Multi-layer perceptron with functional inputs: an inverse regression approach. Scand J Stat 33:807–823
Fraiman R, Muniz G (2001) Trimmed means for functional data. TEST 10:419–440
Ghosh AK, Chaudhuri P (2005) On maximum depth and related classifiers. Scand J Stat 32:327–350
Hall P, Poskitt D, Presnell B (2001) A functional data-analytic approach to signal discrimination. Technometrics 43:1–9
Hoeffding W (1963) Probability inequalities for sums of bounded random varibles. J Am Stat Assoc 58:13–30
Huang D-S, Zheng C-H (2006) Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22:1855–1862
James G, Hastie T (2001) Functional linear discriminant analysis for irregularly sampled curves. J R Stat Soc Ser B 63:533–550
Kuelbs J, Zinn J (2013) Concerns with functional depth. Lat Am J Probab Math Stat 10:831–855
Lange T, Mosler K, Mozharovskyi P (2014a) Fast nonparametric classification based on data depth. Stat Pap 55:49–69
Lange T, Mosler K, Mozharovskyi P (2014b). \(DD\alpha \)-classification of asymmetric and fat-tailed data. In: Spiliopoulou M, Schmidt-Thieme L, Janning R (eds) Data analysis, machine learning and knowledge discovery. Springer, Berlin, pp 71–78
Leng XY, Müller H-G (2006) Classification using functional data analysis for temporal gene expression data. Bioinformatics 22:68–76
Li J, Cuesta-Albertos JA, Liu RY (2012) \(DD\)-classifier: nonparametric classification procedure based on \(DD\)-plot. J Am Stat Assoc 107:737–753
Liu X, Zuo Y (2014) Computing projection depth and its associated estimators. Stat Comput 24:51–63
López-Pintado S, Romo J (2006) Depth-based classification for functional data. In: Liu R, Serfling R, Souvaine D (eds) Data depth: robust multivariate analysis. American Mathematical Society, Computational Geometry and Applications, pp 103–120
Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Acad India 12:49–55
Mosler K, Polyakova Y (2012) General notions of depth for functional data. ar**v:1208.1981v1 [stat.ME]
Mozharovskyi P, Mosler K, Lange T (2015) Classifying real-world data with the \(DD\alpha \)-procedure. Adv Data Anal Classif 9:287–314
Müller H-G, Stadtmüller U (2005) Generalized functional linear models. Ann Stat 33:774–805
Nagy S, Gijbels I, Hlubinka D (2015) Weak convergence of discretely observed functional data with applications. J Multivar Anal. doi:10.1016/j.jmva.2015.06.006
Ramsay JO, Silverman BW (2005) Functional data analysis. Springer series in statistics, 2nd edn. Springer, Berlin
Rossi F, Villa N (2006) Support vector machine for functional data classification. Neurocomputing 69:730–742
Serfling R (2002) A depth function and a scale curve based on spatial quantiles. In: Y Dodge (ed) Statistics and data analysis based on L\(_1\)-norm and related methods. Birkhaeuser, pp 25–38
Sguera C, Galeano P, Lillo RE (2014) Spatial depth-based classification for functional data. TEST 23:725–750
Tian ST, James G (2013) Interpretable dimensionality reduction for classifying functional data. Comput Stat Data Anal 57:282–296
Tuddenham R, Snyder M (1954) Physical growth of California boys and girls from birth to eighteen years. University of California Press, Berkeley
Vapnik VN, Ya Chervonenkis A (1974) Teorija raspoznavanija obrazov (statisticheskie problemy obuchenija) (The theory of pattern recognition (statistical learning problems), in Russian). Nauka, Moscow
Vardi Y, Zhang CH (2000) The multivariate \(L_1\)-median and associated data depth. Proc Natl Acad Sci USA 97:1423–1426
Vasil’ev VI, Lange T (1998) The duality principle in learning for pattern recognition (in Russian). Kibern i Vytschislit’elnaya Tech 121:7–16
Vencálek (2011) Weighted data depth and depth based discrimination. Doctoral thesis. Charles University, Prague
Wang XH, Ray S, Mallick BK (2007) Bayesian curve classification using wavelets. J Am Stat Assoc 102:962–973
Zuo YJ, Serfling R (2000) General notions of statistical depth function. Ann Stat 28:461–482
Acknowledgments
We thank Dominik Liebl for his critical comments on an earlier version of the manuscript, as well as Ondrej Vencalek and Aurore Delaigle for their helpful remarks. The reading and suggestions of two referees are also gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Implementation details
In calculating the depths, \(\mu _Y\) and \(\Sigma _Y\) for the Mahalanobis depth have been determined by the usual moment estimates and similarly, \(\Sigma _Y\) for the spatial depth. The projection depth has been approximated by drawing 1000 directions from the uniform distribution on the unit sphere. Clearly, the number of directions needed for satisfactory approximation depends on the dimension of the space. Observe that for higher-dimensional problems 1000 directions are not enough, which becomes apparent from the analysis of Model 2 in Sect. 7.2. There the location-slope spaces chosen have dimension eight and higher; see also Tables 4 and 8 in Appendix 2. On the other hand, calculating the projection depth even in one dimension costs something. Computing 1 000 directions to approximate the projection depth takes substantially more time than computing the exact Mahalanobis or spatial depths (see Tables 2 and 14 in Appendix 2).
LDA and QDA are used with classical moment estimates, and priors are estimated by the class portions in the training set. The kNN-classifier is applied to location-slope data in its affine invariant form, based on the covariance matrix of the pooled classes. For time reasons, its parameter k is determined by leave-one-out cross-validation over a reduced range, viz. \(k\in \{1, \dots , \max \{\min \{10(m+n)^{1/d}+1,m+n-1\},2\}\}\). The \(\alpha \)-procedure separating the DD-plot uses polynomial space extensions with maximum degree three; the latter is selected by cross-validation. To keep the training speed of the depth-based kNN-classifier comparable with that of the \(DD\alpha \)-classifier, we also determine k by leave-one-out cross-validation on a reduced range of \(k\in \{1, \dots , \max \{\min \{10\sqrt{m+n}+1,(m+n)/2\},2\}\}\).
Due to linear interpolation, the levels are integrated as piecewise-linear functions, and the derivatives as piecewise constant ones. If the dimension of the location-slope space is too large (in particular for inverting the covariance matrix, as it can be the case in Model 2), PCA is used to reduce the dimension. Then \(\epsilon _{max}\) is estimated and all further computations are performed in the subspace of principal components having positive loadings.
To construct the location-slope space, firstly all pairs (L, S) satisfying \(2\le L+S\le M/2\) are considered. (M / 2 amounts to 26 for the synthetic and to 16 for the real data sets.) For each (L, S) the data are transformed to \(\mathbb {R}^{L+S}\), and the Vapnik–Chervonenkis bound \(\epsilon _{max}\) is calculated. Then those five pairs are selected that have smallest \(\epsilon _{max}\). Here, tied values of \(\epsilon _{max}\) are taken into account as well, with the consequence that on an average slightly more than five pairs are selected; see the growth data in Table 2 and both synthetic models in Table 14 of Appendix 2. Finally, among these the best (L, S)-pair is chosen by means of cross-validation. Note that the goal of this cross-validation is not to actually choose a best location-slope dimension but rather to get rid of obviously misleading (L, S)-pairs, which may yield relatively small values of \(\epsilon _{max}\). This is seen from Figs. 4 and 5. When determining an optimal (L, S)-pair by crossLS, the same set of (L, S)-pairs is considered as with VCcrossLS.
In implementing the componentwise method of finite-dimensional space synthesis (crossDHB) we have followed Delaigle et al. (2012) with slight modifications. The original approach of Delaigle et al. (2012) is combined with the sequential approach of Ferraty et al. (2010). Initially, a grid of equally (\(\Delta t\)) distanced discretization points is built. Then a sequence of finite-dimensional spaces is synthesized by adding points of the grid step by step. We start with all pairs of discretization points that have at least distance \(2\Delta t\). [Note that Delaigle et al. (2012) start with single points instead of pairs.] The best of them is chosen by cross-validation. Then step by step features are added. In each step, that point that has best discrimination power (again, in the sense of cross-validation) when added to the already constructed set is chosen as a new feature. The resulting set of points is used to construct a neighborhood of combinations to be further considered. As a neighborhood we use twenty \(2\Delta t\)-distanced points in the second step, and ten in the third; from the fourth step on the sequential approach is applied only.
All our cross-validations are tenfold, except the leave-one-out cross-validations in determining k with both kNN-classifiers. Of course, partitioning the sample into ten parts only may depreciate our approach against a more comprehensive leave-one-out cross-validation. We have chosen it to keep computation times of the crossDHB approach (Delaigle et al. 2012) in practical limits and also to make the comparison of approaches equitable throughout our study.
The calculations have been implemented in an R-environment, based on the R-package “ddalpha” (Mozharovskyi et al. 2015), with speed critical parts written in C++. The R-code implementing our methodology as well as that performing the experiments can be obtained upon request from the authors. In all experiments, one kernel of the processor Core i7-2600 (3.4 GHz) having enough physical memory has been used. Thus, regarding the methodology of Delaigle et al. (2012) our implementation differs from their original one and, due to its module-based structure, may result in larger computation times. For this reason we provide the number of cross-validations performed; see Tables 2 and 14 of Appendix 2. The comparison appears to be fair, as we always use ten-fold cross-validation together with an identical set of classification rules in the finite-dimensional spaces.
Appendix 2: Additional tables
Rights and permissions
About this article
Cite this article
Mosler, K., Mozharovskyi, P. Fast DD-classification of functional data. Stat Papers 58, 1055–1089 (2017). https://doi.org/10.1007/s00362-015-0738-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-015-0738-3