Classification and Regression Trees

  • Chapter
  • First Online:
Predictive Analytics with KNIME

Abstract

This chapter discusses Classification and Regression Trees, widely used in data mining for predictive analytics. The chapter starts by explaining the two principal types of decision trees: classification trees and regression trees. In a classification tree, the dependent variable is categorical, while in a regression tree, it is continuous.

The first section discusses classification trees, using an example of customer targeting in a marketing campaign. The chapter emphasizes that classification trees are “automatic” models, as they select independent variables by searching for optimal splits based on measures of purity or entropy.

The second section covers regression trees, illustrating their application in predicting continuous target variables using an example of head acceleration measurements from simulated motorcycle accidents.

The chapter explores the development of classification trees, explaining how splitting nodes are continued until they are pure or no further splits are possible. It emphasizes the importance of pruning to avoid overfitting, which can lead to poor generalization with unseen data.

The author discusses different pruning techniques, including pre-pruning and post-pruning. Pre-pruning involves setting stop** rules during tree growth, while post-pruning involves trimming the tree after it is fully grown.

The strengths and weaknesses of decision trees are highlighted. The interpretability and intuitiveness of decision trees are listed as strengths, while the risk of overfitting and sensitivity to minor data changes are cited as weaknesses.

Overall, this chapter provides a comprehensive overview of decision trees, their applications, and essential considerations for creating accurate and robust models using this popular data mining technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 93.08
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
EUR 117.69
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113

References

  • Aznar, P. (2020). Decision trees: Gini Vs entropy. https://quantdare.com/decision-trees-gini-vs-entropy/. Accessed 29 July 2023.

  • Bramer, M. (2007). Principles of data mining. Springer.

    MATH  Google Scholar 

  • Breiman, J., Friedman, L., & Stone, C. (1984). Classification and regression trees. Chapman & Hall/CRC.

    MATH  Google Scholar 

  • Esposito, D., & Malerba, F. (1997). A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Recognition, 19(5), 476–491.

    Article  Google Scholar 

  • Galili, T., & I. Meilijson (2020) Splitting matters: How monotone transformation of predictor variables may improve the predictions of decision tree models. https://arxiv.org/abs/1611.04561. Accessed 29 July 2023.

  • Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119–127.

    Article  Google Scholar 

  • Kononenko, I. (1998). The minimum description length based decision tree pruning. In H. Y. Lee & H. Motoda (Eds.), PRICAI’98: Topics in artificial intelligence (Lecture notes in computer science. (1531)). Springer. https://doi.org/10.1007/BFb0095272 Accessed 29 July, 2023

  • Loh, W., & Shih, Y. (1997). Split selection methods for classification trees. Statistica Sinica, 7, 815–840.

    MathSciNet  MATH  Google Scholar 

  • Pyle, D. (1999). Data preparation for data mining. Morgan Kauffman.

    Google Scholar 

  • Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27(3), 221–234.

    Article  Google Scholar 

  • Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann Publishers.

    Google Scholar 

  • Quinlan, J. R., & Rivest, R. L. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, 80(3), 227–248.

    Article  MathSciNet  MATH  Google Scholar 

  • Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423.

    Article  MathSciNet  MATH  Google Scholar 

  • Sonquist, J. N., & Morgan, J. A. (1964). The detection of interaction effects. Survey Research Center, Institute for Social Research, University.

    Google Scholar 

  • Twala, B. (2009). An empirical comparison of techniques for handling incomplete data using decision trees. Applied Artificial Intelligence, 23(5), 373–405.

    Article  Google Scholar 

  • Wallace, C., & Patrick, J. D. (1993). Coding decision trees. Machine Learning, 11, 7–22.

    Article  MATH  Google Scholar 

  • Weisstein, E. W. (n.d.). Stirling numbers of the second kind. MathWorld–A Wolfram Web Resource. https://mathworld.wolfram.com/StirlingNumberoftheSecondKind.html. Accessed 29 July 2023

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Acito, F. (2023). Classification and Regression Trees. In: Predictive Analytics with KNIME. Springer, Cham. https://doi.org/10.1007/978-3-031-45630-5_8

Download citation

Publish with us

Policies and ethics

Navigation