When Efficient Model Averaging Out-Performs Boosting and Bagging

Davidson, Ian; Fan, Wei

doi:10.1007/11871637_46

Ian Davidson²¹ &
Wei Fan²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4213))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

3570 Accesses
11 Citations

Abstract

The Bayes optimal classifier (BOC) is an ensemble technique used extensively in the statistics literature. However, compared to other ensemble techniques such as bagging and boosting, BOC is less known and rarely used in data mining. This is partly due to BOC being perceived as being inefficient and because bagging and boosting consistently outperforms a single model, which raises the question: “Do we even need BOC in datamining?”. We show that the answer to this question is “yes” by illustrating several recent efficient model averaging approximations to BOC can significantly outperform bagging and boosting in realistic situations such as extensive class label noise, sample selection bias and many-class problems. That model averaging techniques outperform bagging and boosting in these situations has not been published in the machine learning, mining or statistical communities to our knowledge.

Download to read the full chapter text

Chapter PDF

Ensemble Method Combination: Bagging and Boosting

Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data

Article Open access 10 February 2017

On Properties of Undersampling Bagging and Its Extensions for Imbalanced Data

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Buntine, W.: A Theory of Learning Classification Rules, Ph.D. Thesis, UTS (1990)
Google Scholar
Davidson, I.: An Ensemble Technique for Stable Learners with Performance Bounds. In: AAAI 2004 (2004)
Google Scholar
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2) (2000)
Google Scholar
Domingos, P.: Why Does Bagging Work? A Bayesian Account and its Implications. In: KDD 1997 (1997)
Google Scholar
Domingos, P.: Bayesian Averaging of Classifiers and the Overfitting Problem. In: AAAI 2000 (2000)
Google Scholar
Efron, B.: The jackknife, the bootstrap, and other resampling plans. SIAM Monograph 38 (1982)
Google Scholar
Fan, W., Davidson, I., Zadrozny, B., Yu, P.: An Improved Categorization of Classifier’s Sensitivity on Sample Selection Bias. In: ICDM 2005 (2005)
Google Scholar
Fan, W., Wang, H., Yu, P.S., Ma, S.: Is random model better? On its accuracy and Efficiency. In: ICDM 2003 (2003)
Google Scholar
Liu, F.T., Ting, K.M., Fan, W.: Maximizing Tree Diversity by Building Complete-Random Decision Trees. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518. Springer, Heidelberg (2005)
Google Scholar
Frank, Eibe: Personal Communication (2004)
Google Scholar
Hoeting, J., Madigan, D., Raftery, A., Volinsky, C.: Bayesian Model Averaging: A Tutorial. Statistical Science 14 (1999)
Google Scholar
Kohavi, R., Wolpert, D.: Bias Plus Variance Decomposition for 0-1 Loss Functions. In: ICML 1996 (1996)
Google Scholar
McCallum, A.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996), http://www.cs.cmu.edu/~mccallum/bow
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Minka, T.P.: Bayesian model averaging is not model combination, MIT Media Lab note (7/6/2000), http://research.microsoft.com/~minka/papers/bma.html
Rennie, J.: 20 Newsgroups. Technical Report, Dept C.S., MIT (2003)
Google Scholar
Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: ICML 2004 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

State University of New York, Albany, NY, 12222
Ian Davidson
IBM T.J. Watson, 9 Skyline Drive, NY, 10532, USA
Wei Fan

Authors

Ian Davidson
View author publications
You can also search for this author in PubMed Google Scholar
Wei Fan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Davidson, I., Fan, W. (2006). When Efficient Model Averaging Out-Performs Boosting and Bagging. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science(), vol 4213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871637_46

Download citation

DOI: https://doi.org/10.1007/11871637_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45374-1
Online ISBN: 978-3-540-46048-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

When Efficient Model Averaging Out-Performs Boosting and Bagging

Abstract

Chapter PDF

Similar content being viewed by others

Ensemble Method Combination: Bagging and Boosting

Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data

On Properties of Undersampling Bagging and Its Extensions for Imbalanced Data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

When Efficient Model Averaging Out-Performs Boosting and Bagging

Abstract

Chapter PDF

Similar content being viewed by others

Ensemble Method Combination: Bagging and Boosting

Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data

On Properties of Undersampling Bagging and Its Extensions for Imbalanced Data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation