Summary
In the last few years, dramatic decreases in generalization error have come about through the growing and combining of an ensemble of predictors. To generate the ensemble, the most common approach is through perturbations of the training set and construction of the same algorithm (trees, neural nets, etc.) using the perturbed training sets. But other methods of generating ensembles have also been explored. Combination is achieved by averaging the predictions in the case of trying to predict a numerical output (regression) or by weighted or weighted plurality vote if class membership is being predicted (classification). We review some of the recent developments that seem notable to us. These include bagging, boosting, and arcing. The basic algorithm used in our empirical studies is tree-structured CART but a variety of other algorithms have also been used to form ensembles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
L. Breiman. Randomizing Outputs to increase Prediction Accuracy, Technical Report 518, Statistics Department, University of California (available at http: //www.stat.berkeley.edu) Submitted to Machine Learning., 1998.
L. Breiman. Prediction Games and Arcing Algorithms, Technical Report 504, Statistics Department, University of California (available at http://www.stat.berkeley.edu) Submitted to Neural Computing, 1997a.
L. Breiman. Pasting Bites Together For Prediction In Large Data Sets And OnLine(available at http://ftp.stat.berkeley.edu/users/breiman/pastebite.ps) Accepted by Machine Learning Journal, 1997b.
L. Breiman, L. Bagging Predictors, MachineLearning,Vol. 24, 123–140, 1996a.
L. Breiman. Arcing Classifiers, Technical Report 460, Statistics Department, University of California, in press Annals of Statistics (available at http://www.stat.berkeley.edu), 1996b.
L. Breiman, L. The heuristics of instability in model selection, Annals of Statistics, 24, pp. 2350–2383, 1996c.
L. Breiman, J. Friedman, R. Olshen, and C. Stone. C. Classification and Regression Trees, Wadsworth, 1984.
T. Dietterich. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting and Randomization, Machine Learning 1–22, 1998.
H. Drucker. Improving Regressors using Boosting Techniques, Proceedings of the Fourteenth International Conference on Machine Learning,ed: Douglas H. Fisher, Jr., pp. 107–115, Morgan Kaufmann, 1997.
H. Drucker, and C. Cortes. Boosting decision trees, Neural Information Processing 8, Morgan-Kaufmann, 479–485, 1996.
Y. Freund, and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 1995.
Y. Freund, and R. Schapire. Experiments with a new boosting algorithm, “Machine Learning: Proceedings of the Thirteenth International Conference,” pp. 148–156, 1996.
J. Friedman. Multivariate Adaptive Regression Splines (with discussion) Annals of Statistics19, 1–141, 1991.
S. Geman, E. Bienenstock, R. Doursat, R. Neural networks and the bias/variance dilemma. Neural Computation 4,pp: 1–58, 1992.
T. Hastie, and R. Tibshirani, R. Handwritten digit recognition via deformable prototypes, (http://ftp.stat.stanford.edu/pub/hastie/zip.ps.Z), 1994.
C. Ji, and S. Ma. Combinations of weak classifiers, Special Issue of Neural Networks and Pattern Recognition, IEEE Trans. Neural Networks, Vol. 8, pp. 32–42, 1997.
Y. Le Cun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard and L. Jackel. Handwritten digit recognition with a back-propagation network, Advances in Neural Information Processing Systems, Vol. 2, pp. 396–404, 1990.
D. Michie, D. Spiegelhalter, and C. Taylor Machine Learning, Neural and Statistical Classification, Ellis Horwood, London, 1994.
J. Quinlan. Bagging, Boosting, and C4.5, Proceedings of AAAI’96 National Conference, on Artificial Intelligence, pp. 725–730, 1996.
R. Schapire, Y. Freund, P. Bartlett, and W. Lee. Boosting the margin, (available at http://www.research.att.com/yoav), 1997.
P. Simard, Y. Le Cun, and J. Denker. Efficient pattern recognition using a new transformation distance, Advances in Neural Information Processing Systems Vol. 5,pp. 50–58, 1993.
V. Vapnik. The Nature of Statistical Learning Theory, Springer, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag London Limited
About this chapter
Cite this chapter
Sharkey, A.J.C. (1999). Combining Predictors. In: Sharkey, A.J.C. (eds) Combining Artificial Neural Nets. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0793-4_2
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0793-4_2
Publisher Name: Springer, London
Print ISBN: 978-1-85233-004-0
Online ISBN: 978-1-4471-0793-4
eBook Packages: Springer Book Archive