Abstract
The paper contains some general remarks on the high art of data analysis, some philosophical thoughts about classification, a partial review of outliers and robustness from the point of view of applications, including a discussion of the problem of model choice, and a review of several aspects of robust estimation of covariance matrices, including the pragmatic choice of a weight function based on empirical and theoretical evidence. Several sections contain new (or at least original) ideas: There are some proposals for incorporating robustness into Bayesian practice and theory, including weighted log likelihoods and Bayes’ theorem for weighted data. Some small ideas refer to artificial classification in a continuum, to a “robust” (Prohorov-type) metric for high-dimensional data, and to the use of multiple minimum spanning trees. A promising but difficult research idea for clustering on the real line, based on a new smoothing method, concludes the paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
ANDREWS, D.F., BICKEL, P.J., HAMPEL, F.R., HUBER, P.J., ROGERS, W.H., and TUKEY, J.W. (1972): Robust Estimates of Location; Survey and Advances. Princeton University Press, Princeton, N.J.
BARNETT, V., and LEWIS, T. (1994): Outliers in Statistical Data. Wiley, New York. Earlier editions: 1978, 1984.
BEATON, A.B., and TUKEY, J.W. (1974): The Fitting of Power Series, Meaning Polynomials, Illustrated on Band-Spectroscopic Data. Technometrics, 16, 2, 147–185, with Discussion —192.
BENNETT, C.A. 1954: Effect of measurement error in chemical process control. Industrial Quality Control, 11, 17–20.
BERGER, J.O. (1984): The robust Bayesian viewpoint. In: J.B. Kadane (Ed.): Robustness of Bayesian Analyses. Elsevier Science, Amsterdam.
BICKEL, P.J. (1975): One-step Huber estimates in the linear model. J. Amer. Statist. Ass., 70, 428–434.
COX, D.R., and HINKLEY, D.V. (1968): A note on the efficiency of least-squares estimates. J. R. Statist. Soc. B, 30, 284–289.
DANIEL, C. (1976): Applications of Statistics to Industrial Experimentation. Wiley, New York.
DANIEL, C., and WOOD, F.S. (1980): Fitting Equations to Data. Wiley, New York. Second edition.
DAVIES, P.L. (1995): Data Features. Statistica Nederlandica, 49, 185–245.
DEMPSTER, A.P. (1967): Upper and lower probabilities induced by a multivalued map**. Ann. Math. Statist., 38, 325–339.
DEMPSTER, A.P. (1968): A generalization of Bayesian inference. J. Roy. Statist. Soc., B 30, 205–245.
DEMPSTER, A.P. (1975): A subjectivist look at robustness. Bull. Internat. Statist. Inst., 46, Book 1, 349–374.
DONOHO, D.L. (1982): Breakdown properties of multivariate location estimators. Ph D qualifying paper, Department of Statistics, Harvard University, Cambridge, Mass.
GNANADESIKAN, R. (1977): Methods for Statistical Data Analysis of Multivariate Observations. Wiley, New York.
GOOD, I.J. (1983): Good Thinking; The Foundations of Probability and Its Applications. University of Minnesota Press, Minneapolis.
GRIZE, Y.L. (1978): Robustheitseigenschaften von Korrelationsschätzungen. Diplomarbeit, Seminar für Statistik, ETH Zürich.
HAMPEL, F. (1968): Contributions to the theory of robust estimation. Ph.D. thesis, University of California, Berkeley.
HAMPEL, F. (1974): The influence curve and its role in robust estimation. J. Amer. Statist. Assoc., 69, 383–393.
HAMPEL, F. (1975): Beyond location parameters: Robust concepts and methods (with discussion). Bull. Internat. Statist. Inst., 46, Book 1, 375–391.
HAMPEL, F. (1978): Optimally bounding the gross-error-sensitivity and the influence of position in factor space. Invited paper ASA/IMS Meeting. Amer. Statist. Assoc. Proc. Statistical Computing Section, ASA, Washington, D.C., 59–64.
HAMPEL, F. (1980): Robuste Schätzungen: Ein anwendungsorientierter Überblick. Biometrical J. 22, 3–21.
HAMPEL, F. (1983): The robustness of some nonparametric procedures. In: P.J. Bickel, K.A. Doksum and J.L Hodges Jr. (Eds.): A Festschrift for Erich L. Lehmann. Wadsworth, Belmont, California, 209–238.
HAMPEL, F. (1985): The breakdown points of the mean combined with some rejection rules. Technometrics, 27, 95–107.
HAMPEL, F. (1987): Design, modelling, and analysis of some biological data sets. In: C.L. Mallows (Ed.): Design, Data, and Analysis, by some friends of Cuthbert Daniel. Wiley, New York, 93–128.
HAMPEL, F. (1997): Some additional notes on the “Princeton Robustness Year”. In: D.R. Brillinger, L.T. Fernholz and S. Morgenthaler (Eds.): The Practice of Data Analysis: Essays in Honor of John W. Tukey. Princeton University Press, Princeton, 133–153.
HAMPEL, F. (1998a): Is statistics too difficult? Canad. J. Statist., 26, 3, 497–513.
HAMPEL, F. (1998b): On the foundations of statistics: A frequentist approach. In: Manuela Souto de Miranda and Isabel Pereira (Eds.): Estatistica: a diversidade na unidade. Ediçôes Salamandra, Lda., Lisboa, Portugal, 77–97.
HAMPEL, F. (2000): An outline of a unifying statistical theory. Gert de Cooman, Terrence L. Fine and Teddy Seidenfield (Eds.): ISIPTA’01 Proceedings of the Second International Symposium on Imprecise Probabilities and their Applications. Cornell University, June 26–29, 2001. Shaker Publishing BV, Maastricht, Netherlands (2000), 205–212.
HAMPEL, F. (2002): Robust Inference. In: Abdel H. El-Shaarawi and Walter W. Piegorsch (Eds.): Encyclopedia of Environmetrics, 3, 1865–1885.
HAMPEL, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. (1986): Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.
HENNIG, C. (1998) Clustering and outlier identification: Fixed Point Clusters. In: A. Rizzi, M. Vichi, and H.-H. Bock (Eds.): Advances in Data Science and Classification. Springer, Berlin, 37–42.
HENNIG, C. (2001) Clusters, Outliers, and Regression: Fixed Point Clusters. J. Multivariate Anal. Submitted.
HENNIG, C., and CHRISTLIEB N. (2002): Validating visual clusters in large data sets: Fixed point clusters of spectral features. Computational Statistics and Data Analysis, to appear.
HUBER, P. (1981): Robust Statistics. Wiley, New York.
JEFFREYS, H. (1939): Theory of Probability. Clarendon Press, Oxford. Later editions: 1948, 1961, 1983.
KUNSCH, H.R., BERAN, J., and HAMPEL F.R. (1993): Contrasts under long-range correlations. Ann. Statist., 212, 943–964.
KAUFMAN, L., and ROUSSEEUW, P.J. (1990): Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
MACHLER, M.B. (1989): Parametric’ Smoothing Quality in Nonparametric Regression: Shape Control by Penalizing Inflection Points. Ph. D. thesis, no 8920, ETH Zurich, Switzerland.
MACHLER, M.B. (1995a): Estimating Distributions with a Fixed Number of Modes. In: H. Rieder (Ed.): Robust Statistics, Data Analysis, and Computer Intensive Methods–Workshop in honor of Peter J. Huber, on his 60th birthday. Springer, Berlin, Lecture Notes in Statistics, Volume 109, 267–276.
MACHLER, M.B. (1995b); Variational Solution of Penalized Likelihood Problems and Smooth Curve Estimation. The Annals of Statistics. 23, 1496–1517.
MARONNA, R.A. (1976): Robust M-estimators of location and scatter. Ann. Statist., 4, 51–67.
PROHOROV, Y.V. (1956): Convergence of random processes and limit theorems in probability theory. Theor. Prob. Appl., 1, 157–214.
RELLES, D.A., and ROGERS, W.H. (1977): Statisticians are fairly robust estimators of location. J. Amer. Statist. Assoc., 72, 107–111.
ROSENTHAL, R. (1978): How often are our numbers wrong? American Psychologist, 33, 11, 1005–1008.
SHAFER, G. (1976): A Mathematical Theory of Evidence. Princeton University Press, Princeton, N. J.
STAHEL, W. (1981a): Robust estimation: Infinitesimal optimality and covariance matrix estimators (in German) Ph. D. thesis, no 6881, ETH Zurich, Switzerland.
STAHEL, W. (1981b): Breakdown of covariance estimators. Research Report 31, ETH Zurich, Switzerland.
STIGLER, S.M. (1977): Do robust estimators work on real data? Ann. Statist., 6, 1055–1098.
STUDENT“ (1927): Errors of routine analysis. Biometrika, 19,151–164.
TUKEY, J.W. (1960): A survey of sampling from contaminated distributions. In: I. Olkin, S.G. Ghurye, W. Hoeffding, W.G. Madow, and H.B. Mann (Eds.): Contributions to Probability and Statistics. Stanford University Press, 448–485.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hampel, F. (2002). Some Thoughts about Classification. In: Jajuga, K., Sokołowski, A., Bock, HH. (eds) Classification, Clustering, and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-56181-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-56181-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43691-1
Online ISBN: 978-3-642-56181-8
eBook Packages: Springer Book Archive