An Experimental Study of the Joint Effects of Class Imbalance and Class Overlap

  • Conference paper
  • First Online:
Next Generation Data Science (SDSC 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2113))

Included in the following conference series:

  • 47 Accesses

Abstract

It has been pointed out that the class imbalance problem is one of the critical areas in classification. Furthermore, existing literatures show that other factors such as class overlap, small disjuncts, and noises will aggravate classification performance when they are combined with class imbalance. In this work, we focus on the joint effects of class imbalance and class overlap, and study binary classification performances of six algorithms under different combination of imbalance ratios and overlap degrees. The experiments corroborate that different types of classifiers show distinct robustness to class imbalance and overlap degree. We arrive the conclusion that essentially the densities of different regions of data space affect the classification performance. In addition, based on observations from our experiments, we infer to changing the densities of different regions of data space should be a good way to address problem of class imbalance and class overlap.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 102.71
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 73.84
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shaukat, A.S., Usha, A.: An effective distance-based feature selection approach for imbalanced data. Appl. Intell. 50, 717–745 (2020)

    Google Scholar 

  2. Dai, Q., Liu, J.W., Shi, Y.H.: Class-overlap undersampling based on Schur decomposition for Class-imbalance problems. Expert Syst. Appl. 221, 119735 (2023)

    Google Scholar 

  3. Hoyos-Osorio, J., Alvarez-Meza, A., et al.: Relevant information undersampling to support imbalanced data classification. Neurocomputing 436, 136–146 (2021)

    Article  Google Scholar 

  4. Li, D.-C., Wang, S.-Y., et al.: Learning class-imbalanced data with region-impurity synthetic minority oversampling technique. Inf. Sci. 607, 1391–1407 (2022)

    Article  Google Scholar 

  5. Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl.-Based Syst. 212, 106631(2021)

    Google Scholar 

  6. Lee, H.K., Kim, S.B.: An overlap-sensitive margin classifier for imbalanced and overlap** data. Expert Syst. Appl. 98, 72–83 (2018)

    Google Scholar 

  7. Barella, V.H., Garcia, L.P.: Assessing the data complexity of imbalanced datasets, Inf. Sci. 553, 83–109 (2021)

    Google Scholar 

  8. Dudjak, M., Martinović, G.: An empirical study of data intrinsic characteristics that make learning fromimbalanced data difficult. Expert Syst. with Appl. 182 (2021)

    Google Scholar 

  9. Santos, M.S., Abreu, P., et al.: A unifying view of class overlap and imbalance: key concepts, multi-view panorama, and open avenues for research. Inf. Fus. 89, 228–253 (2023)

    Article  Google Scholar 

  10. IBM homepage. https://www.ibm.com/topics/naive-bayes

  11. García, V., Sánchez, J., Mollineda, R.An empirical study of the behavior of classifiers on imbalanced and overlapped datasets. In: Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamericann Congress on Pattern Recognition, CIARP 2007, Valparaiso, Chile, pp. 397–406(2007)

    Google Scholar 

  12. García, V., Mollineda, R.A., Sánchez, J.S.: On the k-NN performance in a challenging scenario of imbalance and overlap**. Pattern Anal. Appl. 11(3), 269–280(2008)

    Google Scholar 

  13. Lee, H.K., Kim, S.B.: An overlap-sensitive margin classifier for imbalanced and overlap** data. Expert Syst. Appl.98, 72–83(2018)

    Google Scholar 

  14. Linear Discriminant Analysis. https://www.geeksforgeeks.org/

  15. Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, USA (1995)

    Book  Google Scholar 

  16. Yuan, B.W., Zhang, Z.L., et al.: OIS-RF: a novel overlap and imbalance sensitive random forest. Eng. Appl. Artif. Intell. 104, 104355 (2021)

    Google Scholar 

  17. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification and Scene Analysis. Wiley, New York (2001)

    Google Scholar 

  18. Liang, X.W., Jiang, A.P., et al.: LR-SMOTE—An improved unbalanced dataset oversampling based on K-means and SVM. Knowl.-Based Syst. 196, 105845 (2020)

    Article  Google Scholar 

  19. Shi, S., Li, J., et al.: A hybrid imbalanced classification model based on data density. Inf. Sci. 624, 50–67 (2023)

    Google Scholar 

  20. Wei, Z., Zhang, L., Zhao, L.: Minority-prediction-probability-based oversampling techniquefor imbalanced learning. 622, 1273–1295 (2023)

    Google Scholar 

  21. Han, H., Li, W., Wang, J., Qin, G., Qin, X.: Enhance explainability of manifold learning. Neurocomputing 500, 877–895 (2022). https://doi.org/10.1016/j.neucom.2022.05.119

    Article  Google Scholar 

Download references

Acknowledgments

The authors acknowledge National Natural Science Foundation of China (Grant: 62066039), Natural Science Foundation of Qinghai Province (Grant: 2022-ZJ-925), and the “111” Project (D20035).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yutao Fan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fan, Y., Huang, H., DangZhi, C., Ji, X., Wu, Q. (2024). An Experimental Study of the Joint Effects of Class Imbalance and Class Overlap. In: Han, H., Baker, E. (eds) Next Generation Data Science. SDSC 2023. Communications in Computer and Information Science, vol 2113. Springer, Cham. https://doi.org/10.1007/978-3-031-61816-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-61816-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-61815-4

  • Online ISBN: 978-3-031-61816-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation