Some Thoughts about Classification

Hampel, Frank

doi:10.1007/978-3-642-56181-8_1

Frank Hampel⁷

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

1801 Accesses

Abstract

The paper contains some general remarks on the high art of data analysis, some philosophical thoughts about classification, a partial review of outliers and robustness from the point of view of applications, including a discussion of the problem of model choice, and a review of several aspects of robust estimation of covariance matrices, including the pragmatic choice of a weight function based on empirical and theoretical evidence. Several sections contain new (or at least original) ideas: There are some proposals for incorporating robustness into Bayesian practice and theory, including weighted log likelihoods and Bayes’ theorem for weighted data. Some small ideas refer to artificial classification in a continuum, to a “robust” (Prohorov-type) metric for high-dimensional data, and to the use of multiple minimum spanning trees. A promising but difficult research idea for clustering on the real line, based on a new smoothing method, concludes the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 117.69; Price includes VAT (Germany)

Softcover Book: EUR 160.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Finding Outliers in Gaussian Model-based Clustering

Article 30 May 2024

A Novel Clustering Algorithm Based on a Non-parametric “Anti-Bayesian” Paradigm

Advances in Robust Constrained Model Based Clustering

References

ANDREWS, D.F., BICKEL, P.J., HAMPEL, F.R., HUBER, P.J., ROGERS, W.H., and TUKEY, J.W. (1972): Robust Estimates of Location; Survey and Advances. Princeton University Press, Princeton, N.J.
MATH Google Scholar
BARNETT, V., and LEWIS, T. (1994): Outliers in Statistical Data. Wiley, New York. Earlier editions: 1978, 1984.
MATH Google Scholar
BEATON, A.B., and TUKEY, J.W. (1974): The Fitting of Power Series, Meaning Polynomials, Illustrated on Band-Spectroscopic Data. Technometrics, 16, 2, 147–185, with Discussion —192.
Article MATH Google Scholar
BENNETT, C.A. 1954: Effect of measurement error in chemical process control. Industrial Quality Control, 11, 17–20.
Google Scholar
BERGER, J.O. (1984): The robust Bayesian viewpoint. In: J.B. Kadane (Ed.): Robustness of Bayesian Analyses. Elsevier Science, Amsterdam.
Google Scholar
BICKEL, P.J. (1975): One-step Huber estimates in the linear model. J. Amer. Statist. Ass., 70, 428–434.
Article MathSciNet MATH Google Scholar
COX, D.R., and HINKLEY, D.V. (1968): A note on the efficiency of least-squares estimates. J. R. Statist. Soc. B, 30, 284–289.
MathSciNet MATH Google Scholar
DANIEL, C. (1976): Applications of Statistics to Industrial Experimentation. Wiley, New York.
Book MATH Google Scholar
DANIEL, C., and WOOD, F.S. (1980): Fitting Equations to Data. Wiley, New York. Second edition.
MATH Google Scholar
DAVIES, P.L. (1995): Data Features. Statistica Nederlandica, 49, 185–245.
Article MATH Google Scholar
DEMPSTER, A.P. (1967): Upper and lower probabilities induced by a multivalued map**. Ann. Math. Statist., 38, 325–339.
Article MathSciNet MATH Google Scholar
DEMPSTER, A.P. (1968): A generalization of Bayesian inference. J. Roy. Statist. Soc., B 30, 205–245.
MathSciNet MATH Google Scholar
DEMPSTER, A.P. (1975): A subjectivist look at robustness. Bull. Internat. Statist. Inst., 46, Book 1, 349–374.
MathSciNet Google Scholar
DONOHO, D.L. (1982): Breakdown properties of multivariate location estimators. Ph D qualifying paper, Department of Statistics, Harvard University, Cambridge, Mass.
Google Scholar
GNANADESIKAN, R. (1977): Methods for Statistical Data Analysis of Multivariate Observations. Wiley, New York.
MATH Google Scholar
GOOD, I.J. (1983): Good Thinking; The Foundations of Probability and Its Applications. University of Minnesota Press, Minneapolis.
MATH Google Scholar
GRIZE, Y.L. (1978): Robustheitseigenschaften von Korrelationsschätzungen. Diplomarbeit, Seminar für Statistik, ETH Zürich.
Google Scholar
HAMPEL, F. (1968): Contributions to the theory of robust estimation. Ph.D. thesis, University of California, Berkeley.
Google Scholar
HAMPEL, F. (1974): The influence curve and its role in robust estimation. J. Amer. Statist. Assoc., 69, 383–393.
Article MathSciNet MATH Google Scholar
HAMPEL, F. (1975): Beyond location parameters: Robust concepts and methods (with discussion). Bull. Internat. Statist. Inst., 46, Book 1, 375–391.
MathSciNet Google Scholar
HAMPEL, F. (1978): Optimally bounding the gross-error-sensitivity and the influence of position in factor space. Invited paper ASA/IMS Meeting. Amer. Statist. Assoc. Proc. Statistical Computing Section, ASA, Washington, D.C., 59–64.
Google Scholar
HAMPEL, F. (1980): Robuste Schätzungen: Ein anwendungsorientierter Überblick. Biometrical J. 22, 3–21.
Article MathSciNet MATH Google Scholar
HAMPEL, F. (1983): The robustness of some nonparametric procedures. In: P.J. Bickel, K.A. Doksum and J.L Hodges Jr. (Eds.): A Festschrift for Erich L. Lehmann. Wadsworth, Belmont, California, 209–238.
Google Scholar
HAMPEL, F. (1985): The breakdown points of the mean combined with some rejection rules. Technometrics, 27, 95–107.
Article MathSciNet MATH Google Scholar
HAMPEL, F. (1987): Design, modelling, and analysis of some biological data sets. In: C.L. Mallows (Ed.): Design, Data, and Analysis, by some friends of Cuthbert Daniel. Wiley, New York, 93–128.
Google Scholar
HAMPEL, F. (1997): Some additional notes on the “Princeton Robustness Year”. In: D.R. Brillinger, L.T. Fernholz and S. Morgenthaler (Eds.): The Practice of Data Analysis: Essays in Honor of John W. Tukey. Princeton University Press, Princeton, 133–153.
Google Scholar
HAMPEL, F. (1998a): Is statistics too difficult? Canad. J. Statist., 26, 3, 497–513.
Article MATH Google Scholar
HAMPEL, F. (1998b): On the foundations of statistics: A frequentist approach. In: Manuela Souto de Miranda and Isabel Pereira (Eds.): Estatistica: a diversidade na unidade. Ediçôes Salamandra, Lda., Lisboa, Portugal, 77–97.
Google Scholar
HAMPEL, F. (2000): An outline of a unifying statistical theory. Gert de Cooman, Terrence L. Fine and Teddy Seidenfield (Eds.): ISIPTA’01 Proceedings of the Second International Symposium on Imprecise Probabilities and their Applications. Cornell University, June 26–29, 2001. Shaker Publishing BV, Maastricht, Netherlands (2000), 205–212.
Google Scholar
HAMPEL, F. (2002): Robust Inference. In: Abdel H. El-Shaarawi and Walter W. Piegorsch (Eds.): Encyclopedia of Environmetrics, 3, 1865–1885.
Google Scholar
HAMPEL, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. (1986): Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.
MATH Google Scholar
HENNIG, C. (1998) Clustering and outlier identification: Fixed Point Clusters. In: A. Rizzi, M. Vichi, and H.-H. Bock (Eds.): Advances in Data Science and Classification. Springer, Berlin, 37–42.
Chapter Google Scholar
HENNIG, C. (2001) Clusters, Outliers, and Regression: Fixed Point Clusters. J. Multivariate Anal. Submitted.
Google Scholar
HENNIG, C., and CHRISTLIEB N. (2002): Validating visual clusters in large data sets: Fixed point clusters of spectral features. Computational Statistics and Data Analysis, to appear.
Google Scholar
HUBER, P. (1981): Robust Statistics. Wiley, New York.
Book MATH Google Scholar
JEFFREYS, H. (1939): Theory of Probability. Clarendon Press, Oxford. Later editions: 1948, 1961, 1983.
Google Scholar
KUNSCH, H.R., BERAN, J., and HAMPEL F.R. (1993): Contrasts under long-range correlations. Ann. Statist., 212, 943–964.
Article MathSciNet Google Scholar
KAUFMAN, L., and ROUSSEEUW, P.J. (1990): Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
Book Google Scholar
MACHLER, M.B. (1989): Parametric’ Smoothing Quality in Nonparametric Regression: Shape Control by Penalizing Inflection Points. Ph. D. thesis, no 8920, ETH Zurich, Switzerland.
Google Scholar
MACHLER, M.B. (1995a): Estimating Distributions with a Fixed Number of Modes. In: H. Rieder (Ed.): Robust Statistics, Data Analysis, and Computer Intensive Methods–Workshop in honor of Peter J. Huber, on his 60th birthday. Springer, Berlin, Lecture Notes in Statistics, Volume 109, 267–276.
Chapter Google Scholar
MACHLER, M.B. (1995b); Variational Solution of Penalized Likelihood Problems and Smooth Curve Estimation. The Annals of Statistics. 23, 1496–1517.
Article MathSciNet Google Scholar
MARONNA, R.A. (1976): Robust M-estimators of location and scatter. Ann. Statist., 4, 51–67.
Article MathSciNet MATH Google Scholar
PROHOROV, Y.V. (1956): Convergence of random processes and limit theorems in probability theory. Theor. Prob. Appl., 1, 157–214.
Article MathSciNet Google Scholar
RELLES, D.A., and ROGERS, W.H. (1977): Statisticians are fairly robust estimators of location. J. Amer. Statist. Assoc., 72, 107–111.
Article MATH Google Scholar
ROSENTHAL, R. (1978): How often are our numbers wrong? American Psychologist, 33, 11, 1005–1008.
Article Google Scholar
SHAFER, G. (1976): A Mathematical Theory of Evidence. Princeton University Press, Princeton, N. J.
MATH Google Scholar
STAHEL, W. (1981a): Robust estimation: Infinitesimal optimality and covariance matrix estimators (in German) Ph. D. thesis, no 6881, ETH Zurich, Switzerland.
Google Scholar
STAHEL, W. (1981b): Breakdown of covariance estimators. Research Report 31, ETH Zurich, Switzerland.
Google Scholar
STIGLER, S.M. (1977): Do robust estimators work on real data? Ann. Statist., 6, 1055–1098.
Article MathSciNet Google Scholar
STUDENT“ (1927): Errors of routine analysis. Biometrika, 19,151–164.
Google Scholar
TUKEY, J.W. (1960): A survey of sampling from contaminated distributions. In: I. Olkin, S.G. Ghurye, W. Hoeffding, W.G. Madow, and H.B. Mann (Eds.): Contributions to Probability and Statistics. Stanford University Press, 448–485.
Google Scholar

Download references

Author information

Authors and Affiliations

Seminar for Statistics of ETH, Swiss Federal Institute of Technology, Leonhardstrasse 27, LEO D2, CH-8092, Zurich, Switzerland
Frank Hampel

Authors

Frank Hampel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wroclaw University of Economics, ul. Komandorska 118/120, 53-345, Wroclaw, Poland
Krzysztof Jajuga
Department of Statistics, Cracow University of Economics, ul. Rakowicka 27, 31-510, Cracow, Poland
Andrzej Sokołowski
Institute of Statistics, Technical University of Aachen, Wuellnerstrasse 3, 52056, Aachen, Germany
Hans-Hermann Bock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hampel, F. (2002). Some Thoughts about Classification. In: Jajuga, K., Sokołowski, A., Bock, HH. (eds) Classification, Clustering, and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-56181-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-56181-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43691-1
Online ISBN: 978-3-642-56181-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Some Thoughts about Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Finding Outliers in Gaussian Model-based Clustering

A Novel Clustering Algorithm Based on a Non-parametric “Anti-Bayesian” Paradigm

Advances in Robust Constrained Model Based Clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Some Thoughts about Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Finding Outliers in Gaussian Model-based Clustering

A Novel Clustering Algorithm Based on a Non-parametric “Anti-Bayesian” Paradigm

Advances in Robust Constrained Model Based Clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation