Abstract
This chapter discusses Classification and Regression Trees, widely used in data mining for predictive analytics. The chapter starts by explaining the two principal types of decision trees: classification trees and regression trees. In a classification tree, the dependent variable is categorical, while in a regression tree, it is continuous.
The first section discusses classification trees, using an example of customer targeting in a marketing campaign. The chapter emphasizes that classification trees are “automatic” models, as they select independent variables by searching for optimal splits based on measures of purity or entropy.
The second section covers regression trees, illustrating their application in predicting continuous target variables using an example of head acceleration measurements from simulated motorcycle accidents.
The chapter explores the development of classification trees, explaining how splitting nodes are continued until they are pure or no further splits are possible. It emphasizes the importance of pruning to avoid overfitting, which can lead to poor generalization with unseen data.
The author discusses different pruning techniques, including pre-pruning and post-pruning. Pre-pruning involves setting stop** rules during tree growth, while post-pruning involves trimming the tree after it is fully grown.
The strengths and weaknesses of decision trees are highlighted. The interpretability and intuitiveness of decision trees are listed as strengths, while the risk of overfitting and sensitivity to minor data changes are cited as weaknesses.
Overall, this chapter provides a comprehensive overview of decision trees, their applications, and essential considerations for creating accurate and robust models using this popular data mining technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aznar, P. (2020). Decision trees: Gini Vs entropy. https://quantdare.com/decision-trees-gini-vs-entropy/. Accessed 29 July 2023.
Bramer, M. (2007). Principles of data mining. Springer.
Breiman, J., Friedman, L., & Stone, C. (1984). Classification and regression trees. Chapman & Hall/CRC.
Esposito, D., & Malerba, F. (1997). A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Recognition, 19(5), 476–491.
Galili, T., & I. Meilijson (2020) Splitting matters: How monotone transformation of predictor variables may improve the predictions of decision tree models. https://arxiv.org/abs/1611.04561. Accessed 29 July 2023.
Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119–127.
Kononenko, I. (1998). The minimum description length based decision tree pruning. In H. Y. Lee & H. Motoda (Eds.), PRICAI’98: Topics in artificial intelligence (Lecture notes in computer science. (1531)). Springer. https://doi.org/10.1007/BFb0095272 Accessed 29 July, 2023
Loh, W., & Shih, Y. (1997). Split selection methods for classification trees. Statistica Sinica, 7, 815–840.
Pyle, D. (1999). Data preparation for data mining. Morgan Kauffman.
Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27(3), 221–234.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann Publishers.
Quinlan, J. R., & Rivest, R. L. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, 80(3), 227–248.
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423.
Sonquist, J. N., & Morgan, J. A. (1964). The detection of interaction effects. Survey Research Center, Institute for Social Research, University.
Twala, B. (2009). An empirical comparison of techniques for handling incomplete data using decision trees. Applied Artificial Intelligence, 23(5), 373–405.
Wallace, C., & Patrick, J. D. (1993). Coding decision trees. Machine Learning, 11, 7–22.
Weisstein, E. W. (n.d.). Stirling numbers of the second kind. MathWorld–A Wolfram Web Resource. https://mathworld.wolfram.com/StirlingNumberoftheSecondKind.html. Accessed 29 July 2023
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Acito, F. (2023). Classification and Regression Trees. In: Predictive Analytics with KNIME. Springer, Cham. https://doi.org/10.1007/978-3-031-45630-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-45630-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45629-9
Online ISBN: 978-3-031-45630-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)