Subgroup Discovery with SD4Py

  • Conference paper
  • First Online:
Artificial Intelligence. ECAI 2023 International Workshops (ECAI 2023)

Abstract

We present SD4Py, a free open-source Python package for performing subgroup discovery and analysis. SD4Py makes it easy to discover subgroups from data stored in a Pandas data frame, to undertake follow-on analysis to examine the variability in the quality of the subgroups and to visualise important parameters. The core algorithms for discovering subgroups are implemented by an existing well-established and efficient Java back-end, but are exposed through a user-friendly Python interface. SD4Py offers a concise workflow for not only discovering but also comparing subgroups, in order to select those of interest, and for gaining insights into what is distinctive about individual subgroups.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://rsubgroup.org.

  2. 2.

    https://github.com/SIMIDAT/SDEFSR.

  3. 3.

    http://kt.ijs.si/petra kralj/SubgroupDiscovery/.

  4. 4.

    https://archive.ics.uci.edu/ml/datasets/Adult.

  5. 5.

    https://cslab-hub.github.io/sd4py/sd4py/sd4py.html#discover_subgroups.

References

  1. Atzmueller, M.: Subgroup Discovery. WIREs Data Min. Knowl. Discovery 5(1), 35–49 (2015)

    Article  Google Scholar 

  2. Atzmueller, M.: Compositional subgroup discovery on attributed social interaction networks. In: Soldatova, L., Vanschoren, J., Papadopoulos, G., Ceci, M. (eds.) DS 2018. LNCS (LNAI), vol. 11198, pp. 259–275. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01771-2_17

    Chapter  Google Scholar 

  3. Atzmueller, M., Doerfel, S., Mitzlaff, F.: Description-oriented community detection using exhaustive subgroup discovery. Inf. Sci. 329, 965–984 (2016)

    Article  Google Scholar 

  4. Atzmueller, M., Lemmerich, F.: Fast subgroup discovery for continuous target concepts. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS (LNAI), vol. 5722, pp. 35–44. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04125-9_7

    Chapter  Google Scholar 

  5. Atzmueller, M., Lemmerich, F.: VIKAMINE – open-source subgroup discovery, pattern mining, and analytics. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7524, pp. 842–845. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33486-3_60

    Chapter  Google Scholar 

  6. Atzmueller, M., Lemmerich, F., Krause, B., Hotho, A.: Who are the spammers? understandable local patterns for concept description. In: Proceedings 7th Conference on Computer Methods and Systems. Oprogramowanie Nauko-Techniczne, Krakow, Poland (2009)

    Google Scholar 

  7. Atzmueller, M., Mueller, J.: Subgroup analytics and interactive assessment on ubiquitous data. In: Proceedings of International Workshop on Mining Ubiquitous and Social Environments (MUSE2013), Prague, Czech Republic (2013)

    Google Scholar 

  8. Atzmueller, M., Mueller, J., Becker, M.: Exploratory subgroup analytics on ubiquitous data. In: Atzmueller, M., Chin, A., Scholz, C., Trattner, C. (eds.) MSM/MUSE -2013. LNCS (LNAI), vol. 8940, pp. 1–20. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14723-9_1

    Chapter  Google Scholar 

  9. Atzmueller, M., Puppe, F.: Semi-automatic visual subgroup mining using VIKAMINE. J. Univ. Comput. Sci. 11(11), 1752–1765 (2005)

    Google Scholar 

  10. Atzmueller, M., Puppe, F.: SD-Map – a fast algorithm for exhaustive subgroup discovery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 6–17. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_6

    Chapter  Google Scholar 

  11. Atzmueller, M., Roth-Berghofer, T.: The Mining and Analysis Continuum of Explaining Uncovered. In: Proc. Research and Development in Intelligent Systems XXVII. SGAI 2010. pp. 273–278. Springer, London (2010). https://doi.org/10.1007/978-0-85729-130-1_20

  12. Atzmueller, M., Soldano, H., Santini, G., Bouthinon, D.: MinerLSD: efficient mining of local patterns on attributed networks. Appl. Network Sci. 4(43) (2019)

    Google Scholar 

  13. Atzmueller, M., Sylvester, S., Kanawati, R.: Exploratory and Explanation-Aware Network Intrusion Profiling using Subgroup Discovery and Complex Network Analysis. In: Proc. European Interdisciplinary Cybersecurity Conference. pp. 153–158. ACM (2023)

    Google Scholar 

  14. Belfodil, A., et al.: Fssd-a fast and efficient algorithm for subgroup set discovery. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 91–99. IEEE (2019)

    Google Scholar 

  15. Berlanga, F., del Jesus, M.J., González, P., Herrera, F., Mesonero, M.: Multiobjective evolutionary induction of subgroup discovery fuzzy rules: a case study in marketing. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 337–349. Springer, Heidelberg (2006). https://doi.org/10.1007/11790853_27

    Chapter  Google Scholar 

  16. Biran, O., Cotton, C.: Explanation and justification in machine learning: a survey. In: IJCAI-17 Workshop on Explainable AI (2017)

    Google Scholar 

  17. Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional model mining. DMKD 30(1), 47–98 (2016)

    Google Scholar 

  18. Gamberger, D., Lavrac, N.: Expert-guided subgroup discovery: methodology and application. J. Artif. Intell. Res. 17, 501–527 (2002)

    Article  Google Scholar 

  19. Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA), pp. 80–89. IEEE (2018)

    Google Scholar 

  20. Guven, C., Seipel, D., Atzmueller, M.: Applying ASP for knowledge-based link prediction with explanation generation in feature rich networks. IEEE Trans. Network Sci. Eng. 8(2), 1305–1315 (2021)

    Google Scholar 

  21. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J., Bernstein, P.A. (eds.) 2000 ACM SIGMOD Intl. Conference on Management of Data, pp. 1–12. ACM Press (05 2000)

    Google Scholar 

  22. Hendrickson, A.T., Wang, J., Atzmueller, M.: Identifying exceptional descriptions of people using topic modeling and subgroup discovery. In: Ceci, M., Japkowicz, N., Liu, J., Papadopoulos, G.A., Raś, Z.W. (eds.) ISMIS 2018. LNCS (LNAI), vol. 11177, pp. 454–462. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01851-1_44

    Chapter  Google Scholar 

  23. Hudson, D., Wiltshire, T.J., Atzmueller, M.: Local exceptionality detection in time series using subgroup discovery: an approach exemplified on team interaction data. In: Soares, C., Torgo, L. (eds.) DS 2021. LNCS (LNAI), vol. 12986, pp. 435–445. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88942-5_34

    Chapter  Google Scholar 

  24. Hudson, D., Wiltshire, T.J., Atzmueller, M.: Visualization methods for exploratory subgroup discovery on time series data. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Adeli, H. (eds.) Bio-inspired Systems and Applications: from Robotics to Ambient Intelligence, pp. 34–44. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06527-9_4

    Chapter  Google Scholar 

  25. Iferroudjene, M., Lonjarret, C., Robardet, C., Plantevit, M., Atzmueller, M.: Methods for explaining top-n recommendations through subgroup discovery. Data Min. Knowl. Disc. 37(2), 833–872 (2023)

    Article  MathSciNet  Google Scholar 

  26. Jorge, A.M., Pereira, F., Azevedo, P.J.: Visual interactive subgroup discovery with numerical properties of interest. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 301–305. Springer, Heidelberg (2006). https://doi.org/10.1007/11893318_31

    Chapter  Google Scholar 

  27. Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271. AAAI Press (1996)

    Google Scholar 

  28. Klösgen, W.: Handbook of Data Mining and Knowledge Discovery, chap. 16.3: Subgroup Discovery. Oxford University Press, New York (2002)

    Google Scholar 

  29. Leman, D., Feelders, A., Knobbe, A.: Exceptional model mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5212, pp. 1–16. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87481-2_1

    Chapter  Google Scholar 

  30. Lemmerich, F., Atzmueller, M., Puppe, F.: Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Disc. 30(3), 711–762 (2016)

    Article  MathSciNet  Google Scholar 

  31. Lemmerich, F., Becker, M.: pysubgroup: Easy-to-use subgroup discovery in python. In: Proc. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pp. 658–662. Springer, Heidelberg (2018)

    Google Scholar 

  32. Lemmerich, F., Becker, M., Atzmueller, M.: Generic pattern trees for exhaustive exceptional model mining. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7524, pp. 277–292. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33486-3_18

    Chapter  Google Scholar 

  33. Li, P., Boubrahimi, S.F., Hamdi, S.M.: Motif-guided time series counterfactual explanations. In: International Conference on Pattern Recognition. pp. 203–215. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-37731-0_16

  34. Li, R., Perneczky, R., Drzezga, A., Kramer, S.: Efficient redundancy reduced subgroup discovery via quadratic programming. J. Intell. Inf. Syst. 44, 271–288 (2015)

    Article  Google Scholar 

  35. Lopez-Martinez-Carrasco, A., Juarez, J.M., Campos, M., Canovas-Segura, B.: Vlsd-an efficient subgroup discovery algorithm based on equivalence classes and optimistic estimate. Algorithms 16(6), 274 (2023)

    Article  Google Scholar 

  36. Meeng, M., Knobbe, A.: Flexible enrichment with cortana-software demo. In: Proceedings of BeneLearn, pp. 117–119 (2011)

    Google Scholar 

  37. Meeng, M., Knobbe, A.: For real: a thorough look at numeric attributes in subgroup discovery. Data Min. Knowl. Disc. 35(1), 158–212 (2021)

    Article  MathSciNet  Google Scholar 

  38. Millot, A., Mathonat, R., Cazabet, R., Boulicaut, J.-F.: Actionable subgroup discovery and urban farm optimization. In: Berthold, M.R., Feelders, A., Krempl, G. (eds.) IDA 2020. LNCS, vol. 12080, pp. 339–351. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44584-3_27

    Chapter  Google Scholar 

  39. Mollenhauer, D., Atzmueller, M.: Sequential exceptional pattern discovery using pattern-growth: an extensible framework for interpretable machine learning on sequential data. In: Proc. First International Workshop on Explainable and Interpretable Machine Learning (XI-ML 2020). University of Bamberg (2020)

    Google Scholar 

  40. Proença, H.M., Grünwald, P., Bäck, T., van Leeuwen, M.: Robust subgroup discovery: discovering subgroup lists using mdl. Data Min. Knowl. Disc. 36(5), 1885–1970 (2022)

    Article  MathSciNet  Google Scholar 

  41. Ranjan, C., Reddy, M., Mustonen, M., Paynabar, K., Pourak, K.: Dataset: rare event classification in multivariate time series. ar**v preprint ar**v:1809.10717 (2018)

  42. van Leeuwen, M., Knobbe, A.: Non-redundant subgroup discovery in large and complex data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 459–474. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_30

    Chapter  Google Scholar 

  43. Van Leeuwen, M., Knobbe, A.: Diverse subgroup set discovery. Data Min. Knowl. Disc. 25(2), 208–242 (2012)

    Article  MathSciNet  Google Scholar 

  44. Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63223-9_108

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Hudson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hudson, D., Atzmueller, M. (2024). Subgroup Discovery with SD4Py. In: Nowaczyk, S., et al. Artificial Intelligence. ECAI 2023 International Workshops. ECAI 2023. Communications in Computer and Information Science, vol 1947. Springer, Cham. https://doi.org/10.1007/978-3-031-50396-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-50396-2_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-50395-5

  • Online ISBN: 978-3-031-50396-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation