Abstract
There has been an unprecedented and continuing growth in the volume, quality, and complexity of astronomical data sets over the past few years, mainly through large digital sky surveys. Virtual Observatory (VO) concept represents a scientific and technological framework needed to cope with this data flood. We review some of the applied statistics and computing challenges posed by the analysis of large and complex data sets expected in the VO-based research. The challenges are driven both by the size and the complexity of the data sets (billions of data vectors in parameter spaces of tens or hundreds of dimensions), by the heterogeneity of the data and measurement errors, the selection effects and censored data, and by the intrinsic clustering properties (functional form, topology) of the data distribution in the parameter space of observed attributes. Examples of scientific questions one may wish to address include: objective determination of the numbers of object classes present in the data, and the membership probabilities for each source; searches for unusual, rare, or even new types of objects and phenomena; discovery of physically interesting multivariate correlations which may be present in some of the clusters; etc.
This paper is followed by a commentary by statistician Dianne Cook.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Boller, T., Meurs, E., & Adorf, H.-M. 1992, A&A, 259, 101
Brunner, R.J., Djorgovski, S.G., & Szalay, A.S. (editors) 2001a, Virtual Observatories of the Future, ASPCS vol. 221.
Brunner, R., Djorgovski, S.G., Gal, R.R., Mahabal, A., & Odewahn, S.C. 2001b, in: Virtual Observatories of the Future, eds. R. Brunner, S.G. Djorgovski & A. Szalay, ASPCS, 225, 64
Burl, M., Asker, L., Smyth, P., Fayyad, U., Perona, P., Crumpler, L., & Aubelle, J. 1998, Mach. Learning, 30, 165
de Carvalho, R., Djorgovski, S., Weir, N., Fayyad, U., Cherkauer, K., Roden, J., & Gray, A. 1995, in Astronomical Data Analysis Software and Systems IV, eds. R. Shaw et al., ASPCS, 77, 272
Djorgovski, S.G. 1992, in: Cosmology and Large-Scale Structure in the Universe, ed. R. de Carvalho, ASPCS, 24, 19
Djorgovski, S.G. 1993, in: The Globular Cluster — Galaxy Connection, eds. G. Smith & J. Brodie, ASPCS, 48, 496
Djorgovski, S.G., Pahre, M.A., & de Carvalho, R.R. 1995, in: Fresh Views of Elliptical Galaxies, eds. A. Buzzoni et al., ASPCS, 86, 129
Djorgovski, S.G., Mahabal, A., Brunner, R., Gal, R.R., Castro, S., de Carvalho, R.R., & Odewahn, S.C. 2001a, in: Virtual Observatories of the Future, eds. R. Brunner, S.G. Djorgovski & A. Szalay, ASPCS, 225, 52 [astro-ph/0012453]
Djorgovski, S.G., Brunner, R., Mahabal, A., Odewahn, S.C., de Carvalho, R.R., Gal, R.R., Stolorz, P., Granat, R., Curkendall, D., Jacob, J., & Castro, S. 2001b, in: Mining the Sky, eds. A.J. Banday et al., ESO Astrophysics Symposia, Berlin: Springer Verlag, p. 305 [astro-ph/0012489]
Djorgovski, S.G., Mahabal, A., Brunner, R., Williams, R., Granat, R., Curkendall, D., Jacob, J., & Stolorz, P. 2001c, in: Astronomical Data Analysis, eds. J.-L. Starck & F. Murtagh, Proc. SPIE 4477, p. 43 [astro-ph/0108346]
Fayyad, U., Djorgovski, S.G., & Weir, W.N. 1996a, in Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad et al., Boston: AAAI/MIT Press, p. 471
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (eds.) 1996b, Advances in Knowledge Discovery and Data Mining, Boston: AAAI/MIT Press
Goebel, J., Volk, K., Walker, H., Gerbault, F., Cheeseman, P., Self, M., Stutz, J., & Taylor, W. 1989, A&A, 222, L5
Hakkila, J., Haglin, D., Pendleton, G., Mallozzi, R., Meegan, C., & Rogier, R. 2000, ApJ, 538, 165
Mukherjee, S., Feigelson, E., Babu, J., Murtagh, F., Fraley, C., & Raftery, A. 1998, ApJ, 508, 314
Odewahn, S.C., Stockwell, E., Pennington, R., Humphreys, R., & Zumach, W. 1992, AJ, 103, 318
Paczyński, B. 2000, PASP, 112, 1281
Rogier, R., Hakkila, J., Haglin, D., Pendleton, G., & Mallozzi, R. 2000, in: Gamma-Ray Bursts, 5th Hunsville Symp., eds. R. Kippen et al., AIP Conf. Proc. 526, 38
Szalay, A., & Gray, J. 2001, Science, 293, 2037
Weir, N., Fayyad, U., & Djorgovski, S. 1995, AJ, 109, 2401
Yoo, J., Gray, A., Roden, J., Fayyad, U., de Carvalho, R., & Djorgovski, S. 1996, in: Astronomical Data Analysis Software and Systems V, eds. G. Jacoby & J. Barnes, ASPCS, 101, 41
References
Cook, D. et al, 1995, Grand Tour and Projection Pursuit, Journal of Computational and Graphical Statistics, 4(3):155–172.
Donnell, D. et al, 1994, Analysis of Additive Dependencies using Smallest Additive Principle Components (with discussion), The Annals of Statistics, 22:1636–1673.
Osbourn, G. C. et al, 1995, Empirically defined regions of influence for clustering analysis, Pattern Recognition, 28(11):1793–1806.
Tarpey, T. et al, 1995, Principal Points and Self-Consistent Points of Elliptical Distributions, Annals of Statistics, 23:103–112.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag New York, Inc.
About this paper
Cite this paper
Djorgovski, S.G., Brunner, R., Mahabal, A., Williams, R., Granat, R., Stolorz, P. (2003). Challenges for Cluster Analysis in a Virtual Observatory. In: Statistical Challenges in Astronomy. Springer, New York, NY. https://doi.org/10.1007/0-387-21529-8_9
Download citation
DOI: https://doi.org/10.1007/0-387-21529-8_9
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-95546-9
Online ISBN: 978-0-387-21529-7
eBook Packages: Springer Book Archive