Random Forests

Schonlau, Matthias

doi:10.1007/978-3-031-33390-3_10

Matthias Schonlau³

Part of the book series: Statistics and Computing ((SCO))

1141 Accesses
1 Citations

Abstract

Bagging refers to fitting a learning algorithm on bootstrap samples and aggregating the results. A random forest performs bagging of trees, and in addition, at each split, random forests only consider a random subset of x-variables. This promotes the use of a larger number of x-variables and makes the algorithm less dependent on a small number of variables. For any one tree, roughly one third of the observations are not in the bootstrap sample and form an out-of-bag sample. For a given tree, the out-of-bag sample can be used as validation sample, giving the algorithm the unique ability to tune parameters without a separate validation sample. This is particularly useful when the training data available are limited. A case study predicts math achievement of Portuguese high school students.

The wisdom of tree crowds

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 96.29; Price includes VAT (Germany)

Hardcover Book: EUR 128.39; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Rerunning rforest for different iterations is inefficient. Data for such a plot can in principle be generated from a single run of a random forest algorithm. However, the WEKA JAVA plugin that rforest calls does not support that functionality.
2.
In linear regression, the denominator of the MSE is not n but \(n-p\) where p is the number of parameters estimated. This usually only makes a small difference when p is small relative to n. We ignore this here.

References

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Article MATH Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article MATH Google Scholar
Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. In A. Brito & J. Teixeira (Eds.) Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) (pp. 5–12), Porto, Portugal. EUROSIS.
Google Scholar
Dua, D., & Graff, C. (2017). UCI machine learning repository. https://archive.ics.uci.edu/.
Google Scholar
Efron, B. (1992). Bootstrap methods: Another look at the Jackknife. In Breakthroughs in Statistics (pp. 569–593). Springer.
Google Scholar
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.
Article MATH Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference and prediction (2nd edn.). Heidelberg: Springer.
Book MATH Google Scholar
Ho, T. K. (1995). Random decision forests. In Proceedings of 3rd International Conference on Document Analysis and Recognition (Vol. 1, pp. 278–282). IEEE.
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://scikit-learn.org/stable/modules/ensemble.html.
MathSciNet MATH Google Scholar
Schonlau, M. (2020). Size text box, Patient Joe data. Data set and Manual. Retrieved from https://www.dataarchive.lissdata.nl/study_units/view/971.
Schonlau, M., & Zou, R. Y. (2020). The random forest algorithm for statistical learning. The Stata Journal, 20(1), 3–29.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
Matthias Schonlau

Authors

Matthias Schonlau
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schonlau, M. (2023). Random Forests. In: Applied Statistical Learning. Statistics and Computing. Springer, Cham. https://doi.org/10.1007/978-3-031-33390-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-33390-3_10
Published: 30 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33389-7
Online ISBN: 978-3-031-33390-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics