Abstract
We present SD4Py, a free open-source Python package for performing subgroup discovery and analysis. SD4Py makes it easy to discover subgroups from data stored in a Pandas data frame, to undertake follow-on analysis to examine the variability in the quality of the subgroups and to visualise important parameters. The core algorithms for discovering subgroups are implemented by an existing well-established and efficient Java back-end, but are exposed through a user-friendly Python interface. SD4Py offers a concise workflow for not only discovering but also comparing subgroups, in order to select those of interest, and for gaining insights into what is distinctive about individual subgroups.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Atzmueller, M.: Subgroup Discovery. WIREs Data Min. Knowl. Discovery 5(1), 35–49 (2015)
Atzmueller, M.: Compositional subgroup discovery on attributed social interaction networks. In: Soldatova, L., Vanschoren, J., Papadopoulos, G., Ceci, M. (eds.) DS 2018. LNCS (LNAI), vol. 11198, pp. 259–275. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01771-2_17
Atzmueller, M., Doerfel, S., Mitzlaff, F.: Description-oriented community detection using exhaustive subgroup discovery. Inf. Sci. 329, 965–984 (2016)
Atzmueller, M., Lemmerich, F.: Fast subgroup discovery for continuous target concepts. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS (LNAI), vol. 5722, pp. 35–44. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04125-9_7
Atzmueller, M., Lemmerich, F.: VIKAMINE – open-source subgroup discovery, pattern mining, and analytics. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7524, pp. 842–845. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33486-3_60
Atzmueller, M., Lemmerich, F., Krause, B., Hotho, A.: Who are the spammers? understandable local patterns for concept description. In: Proceedings 7th Conference on Computer Methods and Systems. Oprogramowanie Nauko-Techniczne, Krakow, Poland (2009)
Atzmueller, M., Mueller, J.: Subgroup analytics and interactive assessment on ubiquitous data. In: Proceedings of International Workshop on Mining Ubiquitous and Social Environments (MUSE2013), Prague, Czech Republic (2013)
Atzmueller, M., Mueller, J., Becker, M.: Exploratory subgroup analytics on ubiquitous data. In: Atzmueller, M., Chin, A., Scholz, C., Trattner, C. (eds.) MSM/MUSE -2013. LNCS (LNAI), vol. 8940, pp. 1–20. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14723-9_1
Atzmueller, M., Puppe, F.: Semi-automatic visual subgroup mining using VIKAMINE. J. Univ. Comput. Sci. 11(11), 1752–1765 (2005)
Atzmueller, M., Puppe, F.: SD-Map – a fast algorithm for exhaustive subgroup discovery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 6–17. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_6
Atzmueller, M., Roth-Berghofer, T.: The Mining and Analysis Continuum of Explaining Uncovered. In: Proc. Research and Development in Intelligent Systems XXVII. SGAI 2010. pp. 273–278. Springer, London (2010). https://doi.org/10.1007/978-0-85729-130-1_20
Atzmueller, M., Soldano, H., Santini, G., Bouthinon, D.: MinerLSD: efficient mining of local patterns on attributed networks. Appl. Network Sci. 4(43) (2019)
Atzmueller, M., Sylvester, S., Kanawati, R.: Exploratory and Explanation-Aware Network Intrusion Profiling using Subgroup Discovery and Complex Network Analysis. In: Proc. European Interdisciplinary Cybersecurity Conference. pp. 153–158. ACM (2023)
Belfodil, A., et al.: Fssd-a fast and efficient algorithm for subgroup set discovery. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 91–99. IEEE (2019)
Berlanga, F., del Jesus, M.J., González, P., Herrera, F., Mesonero, M.: Multiobjective evolutionary induction of subgroup discovery fuzzy rules: a case study in marketing. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 337–349. Springer, Heidelberg (2006). https://doi.org/10.1007/11790853_27
Biran, O., Cotton, C.: Explanation and justification in machine learning: a survey. In: IJCAI-17 Workshop on Explainable AI (2017)
Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional model mining. DMKD 30(1), 47–98 (2016)
Gamberger, D., Lavrac, N.: Expert-guided subgroup discovery: methodology and application. J. Artif. Intell. Res. 17, 501–527 (2002)
Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA), pp. 80–89. IEEE (2018)
Guven, C., Seipel, D., Atzmueller, M.: Applying ASP for knowledge-based link prediction with explanation generation in feature rich networks. IEEE Trans. Network Sci. Eng. 8(2), 1305–1315 (2021)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J., Bernstein, P.A. (eds.) 2000 ACM SIGMOD Intl. Conference on Management of Data, pp. 1–12. ACM Press (05 2000)
Hendrickson, A.T., Wang, J., Atzmueller, M.: Identifying exceptional descriptions of people using topic modeling and subgroup discovery. In: Ceci, M., Japkowicz, N., Liu, J., Papadopoulos, G.A., Raś, Z.W. (eds.) ISMIS 2018. LNCS (LNAI), vol. 11177, pp. 454–462. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01851-1_44
Hudson, D., Wiltshire, T.J., Atzmueller, M.: Local exceptionality detection in time series using subgroup discovery: an approach exemplified on team interaction data. In: Soares, C., Torgo, L. (eds.) DS 2021. LNCS (LNAI), vol. 12986, pp. 435–445. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88942-5_34
Hudson, D., Wiltshire, T.J., Atzmueller, M.: Visualization methods for exploratory subgroup discovery on time series data. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Adeli, H. (eds.) Bio-inspired Systems and Applications: from Robotics to Ambient Intelligence, pp. 34–44. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06527-9_4
Iferroudjene, M., Lonjarret, C., Robardet, C., Plantevit, M., Atzmueller, M.: Methods for explaining top-n recommendations through subgroup discovery. Data Min. Knowl. Disc. 37(2), 833–872 (2023)
Jorge, A.M., Pereira, F., Azevedo, P.J.: Visual interactive subgroup discovery with numerical properties of interest. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 301–305. Springer, Heidelberg (2006). https://doi.org/10.1007/11893318_31
Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271. AAAI Press (1996)
Klösgen, W.: Handbook of Data Mining and Knowledge Discovery, chap. 16.3: Subgroup Discovery. Oxford University Press, New York (2002)
Leman, D., Feelders, A., Knobbe, A.: Exceptional model mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5212, pp. 1–16. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87481-2_1
Lemmerich, F., Atzmueller, M., Puppe, F.: Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Disc. 30(3), 711–762 (2016)
Lemmerich, F., Becker, M.: pysubgroup: Easy-to-use subgroup discovery in python. In: Proc. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pp. 658–662. Springer, Heidelberg (2018)
Lemmerich, F., Becker, M., Atzmueller, M.: Generic pattern trees for exhaustive exceptional model mining. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7524, pp. 277–292. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33486-3_18
Li, P., Boubrahimi, S.F., Hamdi, S.M.: Motif-guided time series counterfactual explanations. In: International Conference on Pattern Recognition. pp. 203–215. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-37731-0_16
Li, R., Perneczky, R., Drzezga, A., Kramer, S.: Efficient redundancy reduced subgroup discovery via quadratic programming. J. Intell. Inf. Syst. 44, 271–288 (2015)
Lopez-Martinez-Carrasco, A., Juarez, J.M., Campos, M., Canovas-Segura, B.: Vlsd-an efficient subgroup discovery algorithm based on equivalence classes and optimistic estimate. Algorithms 16(6), 274 (2023)
Meeng, M., Knobbe, A.: Flexible enrichment with cortana-software demo. In: Proceedings of BeneLearn, pp. 117–119 (2011)
Meeng, M., Knobbe, A.: For real: a thorough look at numeric attributes in subgroup discovery. Data Min. Knowl. Disc. 35(1), 158–212 (2021)
Millot, A., Mathonat, R., Cazabet, R., Boulicaut, J.-F.: Actionable subgroup discovery and urban farm optimization. In: Berthold, M.R., Feelders, A., Krempl, G. (eds.) IDA 2020. LNCS, vol. 12080, pp. 339–351. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44584-3_27
Mollenhauer, D., Atzmueller, M.: Sequential exceptional pattern discovery using pattern-growth: an extensible framework for interpretable machine learning on sequential data. In: Proc. First International Workshop on Explainable and Interpretable Machine Learning (XI-ML 2020). University of Bamberg (2020)
Proença, H.M., Grünwald, P., Bäck, T., van Leeuwen, M.: Robust subgroup discovery: discovering subgroup lists using mdl. Data Min. Knowl. Disc. 36(5), 1885–1970 (2022)
Ranjan, C., Reddy, M., Mustonen, M., Paynabar, K., Pourak, K.: Dataset: rare event classification in multivariate time series. ar**v preprint ar**v:1809.10717 (2018)
van Leeuwen, M., Knobbe, A.: Non-redundant subgroup discovery in large and complex data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 459–474. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_30
Van Leeuwen, M., Knobbe, A.: Diverse subgroup set discovery. Data Min. Knowl. Disc. 25(2), 208–242 (2012)
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63223-9_108
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hudson, D., Atzmueller, M. (2024). Subgroup Discovery with SD4Py. In: Nowaczyk, S., et al. Artificial Intelligence. ECAI 2023 International Workshops. ECAI 2023. Communications in Computer and Information Science, vol 1947. Springer, Cham. https://doi.org/10.1007/978-3-031-50396-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-50396-2_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50395-5
Online ISBN: 978-3-031-50396-2
eBook Packages: Computer ScienceComputer Science (R0)