Construction of Artificial Most Representative Trees by Minimizing Tree-Based Distance Measures

Laabs, Björn-Hergen; Kronziel, Lea L.; König, Inke R.; Szymczak, Silke

doi:10.1007/978-3-031-63797-1_15

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2154))

Included in the following conference series:

World Conference on Explainable Artificial Intelligence

24 Accesses

Abstract

The random forest (RF) algorithm is known for its predictive performance but has been criticized for its lack of interpretability due to its complex ensemble nature. To address the issue of explainability our study questions the traditional approach of using most representative trees (MRTs) to simplify RF interpretation, highlighting the potential for misinterpretation due to non-informative early splits. To overcome these limitations, we propose a new method involving the construction of artificial representative trees (ARTs) through a greedy algorithm that iteratively builds a tree to minimize the distance to the RF ensemble, thereby preserving the predictive performance of the RF. We give a detailed description of the methodological framework for ART construction, including strategies for reducing computational complexity through variable preselection and quantile-based splitting. Results from extensive simulations demonstrate that ARTs provide a more accurate reflection of the RF's predictive performance and substantially reduce the false discovery rate, thus offering a more reliable interpretative model. The findings suggest that ARTs represent an advance in addressing the interpretation of RF models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data Availability

All methods developed in this manuscript are publicly available within the R package timbR on GitHub (https://github.com/imbs-hl/timbR). Additionally, precalculated results of the simulations can be accessed in a shiny app (https://bioweb2.imbs.uksh.de/rsconnect/ART_paper/) and code for reproducing the simulation study and regenerating figures is publicly available on GitHub (https://github.com/imbs-hl/ART_paper).

References

Breiman, L.: Random Forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article Google Scholar
Strobl, C., Boulesteix, A.-L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 8, 25 (2007). https://doi.org/10.1186/1471-2105-8-25
Article Google Scholar
Nembrini, S., König, I.R., Wright, M.N.: The revival of the Gini importance? Bioinformatics 34, 3711–3718 (2018). https://doi.org/10.1093/bioinformatics/bty373
Article Google Scholar
Banerjee, M., Ding, Y., Noone, A.-M.: Identifying representative trees from ensembles. Stat. Med. 31, 1601–1616 (2012). https://doi.org/10.1002/sim.4492
Article MathSciNet Google Scholar
Laabs, B.-H., Westenberger, A., König, I.R.: Identification of representative trees in random forests based on a new tree-based distance measure. Adv. Data Anal. Classif. (2023). https://doi.org/10.1007/s11634-023-00537-7
Article Google Scholar
Meinshausen, N.: Node harvest. Ann. Appl. Stat. 4 (2010). https://doi.org/10.1214/10-AOAS367
Deng, H.: Interpreting tree ensembles with inTrees. Int. J. Data Sci. Anal. 7, 277–287 (2019). https://doi.org/10.1007/s41060-018-0144-8
Article Google Scholar
Bénard, C., Biau, G., da Veiga, S., Scornet, E.: Interpretable Random Forests via Rule Extraction (2020). https://doi.org/10.48550/ARXIV.2004.14841
Domingos, P.: Knowledge discovery via multiple models. Intell. Data Anal. 2, 187–202 (1998). https://doi.org/10.1016/S1088-467X(98)00023-7
Article Google Scholar
Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M. J.: Simple mimetic classifiers. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 156–171. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45065-3_14
Chapter Google Scholar
Breiman, L. (ed.): Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1998)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996). https://doi.org/10.1007/BF00058655
Article Google Scholar
Seifert, S., Gundlach, S., Szymczak, S.: Surrogate minimal depth as an importance measure for variables in random forests. Bioinformatics 35, 3663–3671 (2019). https://doi.org/10.1093/bioinformatics/btz149
Article Google Scholar
Voges, L.F., Jarren, L.C., Seifert, S.: Exploitation of surrogate variables in random forests for unbiased analysis of mutual impact and importance of features. Bioinformatics. 39, btad471 (2023). https://doi.org/10.1093/bioinformatics/btad471
Morris, T.P., White, I.R., Crowther, M.J.: Using simulation studies to evaluate statistical methods. Stat. Med. 38, 2074–2102 (2019). https://doi.org/10.1002/sim.8086
Article MathSciNet Google Scholar
Wright, M.N., Ziegler, A.: ranger : a fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Soft. 77 (2017). https://doi.org/10.18637/jss.v077.i01
Lang, M., Bischl, B., Surmann, D.: batchtools: tools for R to work on batch systems. JOSS. 2, 135 (2017). https://doi.org/10.21105/joss.00135
Janitza, S., Celik, E., Boulesteix, A.-L.: A computationally fast variable importance test for random forests for high-dimensional data. Adv. Data Anal. Classif. 12, 885–915 (2018). https://doi.org/10.1007/s11634-016-0276-4
Article MathSciNet Google Scholar
Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. J. Stat. Soft. 36 (2010). https://doi.org/10.18637/jss.v036.i11

Download references

Acknowledgments

This study was funded by the Medical Section of the University of Lübeck (J01–2024, BL).

Funding

All authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Institute of Medical Biometry and Statistics, University of Lübeck, Lübeck, Germany
Björn-Hergen Laabs, Lea L. Kronziel, Inke R. König & Silke Szymczak

Authors

Björn-Hergen Laabs
View author publications
You can also search for this author in PubMed Google Scholar
Lea L. Kronziel
View author publications
You can also search for this author in PubMed Google Scholar
Inke R. König
View author publications
You can also search for this author in PubMed Google Scholar
Silke Szymczak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Björn-Hergen Laabs .

Editor information

Editors and Affiliations

Technological University Dublin, Dublin, Ireland
Luca Longo
Fraunhofer Institute for Telecommunications, Berlin, Germany
Sebastian Lapuschkin
University of Marburg, Marburg, Germany
Christin Seifert

Appendix

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Laabs, BH., Kronziel, L.L., König, I.R., Szymczak, S. (2024). Construction of Artificial Most Representative Trees by Minimizing Tree-Based Distance Measures. In: Longo, L., Lapuschkin, S., Seifert, C. (eds) Explainable Artificial Intelligence. xAI 2024. Communications in Computer and Information Science, vol 2154. Springer, Cham. https://doi.org/10.1007/978-3-031-63797-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-63797-1_15
Published: 10 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63796-4
Online ISBN: 978-3-031-63797-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Construction of Artificial Most Representative Trees by Minimizing Tree-Based Distance Measures

Abstract

Access this chapter

Subscribe and save

Buy Now

Data Availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation