Some Thoughts about Classification

  • Conference paper
Classification, Clustering, and Data Analysis
  • 1801 Accesses

Abstract

The paper contains some general remarks on the high art of data analysis, some philosophical thoughts about classification, a partial review of outliers and robustness from the point of view of applications, including a discussion of the problem of model choice, and a review of several aspects of robust estimation of covariance matrices, including the pragmatic choice of a weight function based on empirical and theoretical evidence. Several sections contain new (or at least original) ideas: There are some proposals for incorporating robustness into Bayesian practice and theory, including weighted log likelihoods and Bayes’ theorem for weighted data. Some small ideas refer to artificial classification in a continuum, to a “robust” (Prohorov-type) metric for high-dimensional data, and to the use of multiple minimum spanning trees. A promising but difficult research idea for clustering on the real line, based on a new smoothing method, concludes the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 117.69
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 160.49
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • ANDREWS, D.F., BICKEL, P.J., HAMPEL, F.R., HUBER, P.J., ROGERS, W.H., and TUKEY, J.W. (1972): Robust Estimates of Location; Survey and Advances. Princeton University Press, Princeton, N.J.

    MATH  Google Scholar 

  • BARNETT, V., and LEWIS, T. (1994): Outliers in Statistical Data. Wiley, New York. Earlier editions: 1978, 1984.

    MATH  Google Scholar 

  • BEATON, A.B., and TUKEY, J.W. (1974): The Fitting of Power Series, Meaning Polynomials, Illustrated on Band-Spectroscopic Data. Technometrics, 16, 2, 147–185, with Discussion —192.

    Article  MATH  Google Scholar 

  • BENNETT, C.A. 1954: Effect of measurement error in chemical process control. Industrial Quality Control, 11, 17–20.

    Google Scholar 

  • BERGER, J.O. (1984): The robust Bayesian viewpoint. In: J.B. Kadane (Ed.): Robustness of Bayesian Analyses. Elsevier Science, Amsterdam.

    Google Scholar 

  • BICKEL, P.J. (1975): One-step Huber estimates in the linear model. J. Amer. Statist. Ass., 70, 428–434.

    Article  MathSciNet  MATH  Google Scholar 

  • COX, D.R., and HINKLEY, D.V. (1968): A note on the efficiency of least-squares estimates. J. R. Statist. Soc. B, 30, 284–289.

    MathSciNet  MATH  Google Scholar 

  • DANIEL, C. (1976): Applications of Statistics to Industrial Experimentation. Wiley, New York.

    Book  MATH  Google Scholar 

  • DANIEL, C., and WOOD, F.S. (1980): Fitting Equations to Data. Wiley, New York. Second edition.

    MATH  Google Scholar 

  • DAVIES, P.L. (1995): Data Features. Statistica Nederlandica, 49, 185–245.

    Article  MATH  Google Scholar 

  • DEMPSTER, A.P. (1967): Upper and lower probabilities induced by a multivalued map**. Ann. Math. Statist., 38, 325–339.

    Article  MathSciNet  MATH  Google Scholar 

  • DEMPSTER, A.P. (1968): A generalization of Bayesian inference. J. Roy. Statist. Soc., B 30, 205–245.

    MathSciNet  MATH  Google Scholar 

  • DEMPSTER, A.P. (1975): A subjectivist look at robustness. Bull. Internat. Statist. Inst., 46, Book 1, 349–374.

    MathSciNet  Google Scholar 

  • DONOHO, D.L. (1982): Breakdown properties of multivariate location estimators. Ph D qualifying paper, Department of Statistics, Harvard University, Cambridge, Mass.

    Google Scholar 

  • GNANADESIKAN, R. (1977): Methods for Statistical Data Analysis of Multivariate Observations. Wiley, New York.

    MATH  Google Scholar 

  • GOOD, I.J. (1983): Good Thinking; The Foundations of Probability and Its Applications. University of Minnesota Press, Minneapolis.

    MATH  Google Scholar 

  • GRIZE, Y.L. (1978): Robustheitseigenschaften von Korrelationsschätzungen. Diplomarbeit, Seminar für Statistik, ETH Zürich.

    Google Scholar 

  • HAMPEL, F. (1968): Contributions to the theory of robust estimation. Ph.D. thesis, University of California, Berkeley.

    Google Scholar 

  • HAMPEL, F. (1974): The influence curve and its role in robust estimation. J. Amer. Statist. Assoc., 69, 383–393.

    Article  MathSciNet  MATH  Google Scholar 

  • HAMPEL, F. (1975): Beyond location parameters: Robust concepts and methods (with discussion). Bull. Internat. Statist. Inst., 46, Book 1, 375–391.

    MathSciNet  Google Scholar 

  • HAMPEL, F. (1978): Optimally bounding the gross-error-sensitivity and the influence of position in factor space. Invited paper ASA/IMS Meeting. Amer. Statist. Assoc. Proc. Statistical Computing Section, ASA, Washington, D.C., 59–64.

    Google Scholar 

  • HAMPEL, F. (1980): Robuste Schätzungen: Ein anwendungsorientierter Überblick. Biometrical J. 22, 3–21.

    Article  MathSciNet  MATH  Google Scholar 

  • HAMPEL, F. (1983): The robustness of some nonparametric procedures. In: P.J. Bickel, K.A. Doksum and J.L Hodges Jr. (Eds.): A Festschrift for Erich L. Lehmann. Wadsworth, Belmont, California, 209–238.

    Google Scholar 

  • HAMPEL, F. (1985): The breakdown points of the mean combined with some rejection rules. Technometrics, 27, 95–107.

    Article  MathSciNet  MATH  Google Scholar 

  • HAMPEL, F. (1987): Design, modelling, and analysis of some biological data sets. In: C.L. Mallows (Ed.): Design, Data, and Analysis, by some friends of Cuthbert Daniel. Wiley, New York, 93–128.

    Google Scholar 

  • HAMPEL, F. (1997): Some additional notes on the “Princeton Robustness Year”. In: D.R. Brillinger, L.T. Fernholz and S. Morgenthaler (Eds.): The Practice of Data Analysis: Essays in Honor of John W. Tukey. Princeton University Press, Princeton, 133–153.

    Google Scholar 

  • HAMPEL, F. (1998a): Is statistics too difficult? Canad. J. Statist., 26, 3, 497–513.

    Article  MATH  Google Scholar 

  • HAMPEL, F. (1998b): On the foundations of statistics: A frequentist approach. In: Manuela Souto de Miranda and Isabel Pereira (Eds.): Estatistica: a diversidade na unidade. Ediçôes Salamandra, Lda., Lisboa, Portugal, 77–97.

    Google Scholar 

  • HAMPEL, F. (2000): An outline of a unifying statistical theory. Gert de Cooman, Terrence L. Fine and Teddy Seidenfield (Eds.): ISIPTA’01 Proceedings of the Second International Symposium on Imprecise Probabilities and their Applications. Cornell University, June 26–29, 2001. Shaker Publishing BV, Maastricht, Netherlands (2000), 205–212.

    Google Scholar 

  • HAMPEL, F. (2002): Robust Inference. In: Abdel H. El-Shaarawi and Walter W. Piegorsch (Eds.): Encyclopedia of Environmetrics, 3, 1865–1885.

    Google Scholar 

  • HAMPEL, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. (1986): Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.

    MATH  Google Scholar 

  • HENNIG, C. (1998) Clustering and outlier identification: Fixed Point Clusters. In: A. Rizzi, M. Vichi, and H.-H. Bock (Eds.): Advances in Data Science and Classification. Springer, Berlin, 37–42.

    Chapter  Google Scholar 

  • HENNIG, C. (2001) Clusters, Outliers, and Regression: Fixed Point Clusters. J. Multivariate Anal. Submitted.

    Google Scholar 

  • HENNIG, C., and CHRISTLIEB N. (2002): Validating visual clusters in large data sets: Fixed point clusters of spectral features. Computational Statistics and Data Analysis, to appear.

    Google Scholar 

  • HUBER, P. (1981): Robust Statistics. Wiley, New York.

    Book  MATH  Google Scholar 

  • JEFFREYS, H. (1939): Theory of Probability. Clarendon Press, Oxford. Later editions: 1948, 1961, 1983.

    Google Scholar 

  • KUNSCH, H.R., BERAN, J., and HAMPEL F.R. (1993): Contrasts under long-range correlations. Ann. Statist., 212, 943–964.

    Article  MathSciNet  Google Scholar 

  • KAUFMAN, L., and ROUSSEEUW, P.J. (1990): Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.

    Book  Google Scholar 

  • MACHLER, M.B. (1989): Parametric’ Smoothing Quality in Nonparametric Regression: Shape Control by Penalizing Inflection Points. Ph. D. thesis, no 8920, ETH Zurich, Switzerland.

    Google Scholar 

  • MACHLER, M.B. (1995a): Estimating Distributions with a Fixed Number of Modes. In: H. Rieder (Ed.): Robust Statistics, Data Analysis, and Computer Intensive Methods–Workshop in honor of Peter J. Huber, on his 60th birthday. Springer, Berlin, Lecture Notes in Statistics, Volume 109, 267–276.

    Chapter  Google Scholar 

  • MACHLER, M.B. (1995b); Variational Solution of Penalized Likelihood Problems and Smooth Curve Estimation. The Annals of Statistics. 23, 1496–1517.

    Article  MathSciNet  Google Scholar 

  • MARONNA, R.A. (1976): Robust M-estimators of location and scatter. Ann. Statist., 4, 51–67.

    Article  MathSciNet  MATH  Google Scholar 

  • PROHOROV, Y.V. (1956): Convergence of random processes and limit theorems in probability theory. Theor. Prob. Appl., 1, 157–214.

    Article  MathSciNet  Google Scholar 

  • RELLES, D.A., and ROGERS, W.H. (1977): Statisticians are fairly robust estimators of location. J. Amer. Statist. Assoc., 72, 107–111.

    Article  MATH  Google Scholar 

  • ROSENTHAL, R. (1978): How often are our numbers wrong? American Psychologist, 33, 11, 1005–1008.

    Article  Google Scholar 

  • SHAFER, G. (1976): A Mathematical Theory of Evidence. Princeton University Press, Princeton, N. J.

    MATH  Google Scholar 

  • STAHEL, W. (1981a): Robust estimation: Infinitesimal optimality and covariance matrix estimators (in German) Ph. D. thesis, no 6881, ETH Zurich, Switzerland.

    Google Scholar 

  • STAHEL, W. (1981b): Breakdown of covariance estimators. Research Report 31, ETH Zurich, Switzerland.

    Google Scholar 

  • STIGLER, S.M. (1977): Do robust estimators work on real data? Ann. Statist., 6, 1055–1098.

    Article  MathSciNet  Google Scholar 

  • STUDENT“ (1927): Errors of routine analysis. Biometrika, 19,151–164.

    Google Scholar 

  • TUKEY, J.W. (1960): A survey of sampling from contaminated distributions. In: I. Olkin, S.G. Ghurye, W. Hoeffding, W.G. Madow, and H.B. Mann (Eds.): Contributions to Probability and Statistics. Stanford University Press, 448–485.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hampel, F. (2002). Some Thoughts about Classification. In: Jajuga, K., Sokołowski, A., Bock, HH. (eds) Classification, Clustering, and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-56181-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-56181-8_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43691-1

  • Online ISBN: 978-3-642-56181-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Navigation