An Iterated Greedy Algorithm for Improving the Generation of Synthetic Patterns in Imbalanced Learning

Maestre-García, Francisco Javier; García-Martínez, Carlos; Pérez-Ortiz, María; Gutiérrez, Pedro Antonio

doi:10.1007/978-3-319-59147-6_44

Francisco Javier Maestre-García¹⁶,
Carlos García-Martínez¹⁶,
María Pérez-Ortiz¹⁷ &
…
Pedro Antonio Gutiérrez¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10306))

Included in the following conference series:

International Work-Conference on Artificial Neural Networks

2986 Accesses

Abstract

Real-world classification datasets often present a skewed distribution of patterns, where one or more classes are under-represented with respect to the rest. One of the most successful approaches for alleviating this problem is the generation of synthetic minority samples by convex combination of available ones. Within this framework, adaptive synthetic (ADASYN) sampling is a relatively new method which imposes weights on minority examples according to their learning complexity, in such a way that difficult examples are more prone to be over-sampled. This paper proposes an improvement of the ADASYN method, where the learning complexity of these patterns is also used to decide which sample of the neighbourhood is selected. Moreover, to avoid suboptimal results when performing the random convex combination, this paper explores the application of an iterative greedy algorithm which refines the synthetic patterns by repeatedly replacing a part of them. For the experiments, six binary datasets and four over-sampling methods are considered. The results show that the new version of ADASYN leads to more robust results and that the application of the iterative greedy metaheuristic significantly improves the quality of the generated patterns, presenting a positive effect on the final classification model.

This work has been partially subsidised by the TIN2014-54583-C2-1-R, TIN2015-70308-REDT, and TIN2014-55252-P projects of the Spanish Ministerial Commission of Science and Technology (MINECO, Spain) and FEDER funds (EU).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ESMOTE: an overproduce-and-choose synthetic examples generation strategy based on evolutionary computation

Article 03 December 2022

A Diversity-Based Synthetic Oversampling Using Clustering for Handling Extreme Imbalance

Article Open access 08 November 2023

A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning

Article Open access 05 January 2023

Notes

References

Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17(2–3), 255–287 (2010)
Google Scholar
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01307-2_43
Chapter Google Scholar
Chan, P.K., Fan, W., Prodromidis, A.L., Stolfo, S.J.: Distributed data mining in credit card fraud detection. IEEE Intell. Syst. Appl. 14(6), 67–74 (1999)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Cruz, R., Fernandes, K., Cardoso, J.S., Costa, J.F.P.: Tackling class imbalance with ranking. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 2182–2187. IEEE (2016)
Google Scholar
Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM (1999)
Google Scholar
Fernández-Caballero, J.C., Martínez-Estudillo, F.J., Hervás-Martínez, C., Gutiérrez, P.A.: Sensitivity versus accuracy in multiclass problems using memetic pareto evolutionary neural networks. IEEE Trans. Neural Netw. 21(5), 750–770 (2010)
Article Google Scholar
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(4), 463–484 (2012)
Article Google Scholar
García-Martínez, C., Lozano, M., Rodriguez, F.J.: Arbitrary function optimization. No free lunch and real-world problems. Soft. Comput. 16(12), 2115–2133 (2012)
Article Google Scholar
García-Martínez, C., Rodriguez, F.J., Lozano, M.: Tabu-enhanced iterated greedy algorithm: a case study in the quadratic multiple knapsack problem. Eur. J. Oper. Res. 232, 454–463 (2014)
Article MathSciNet MATH Google Scholar
Garcia-Pedrajas, N., Pérez-Rodríguez, J., de Haro-García, A.: OligoIS: scalable instance selection for class-imbalanced data sets. IEEE Trans. Cybern. 43(1), 332–346 (2013)
Article Google Scholar
Ghazikhani, A., Yazdi, H.S., Monsefi, R.: Class imbalance handling using wrapper-based random oversampling. In: 20th Iranian Conference on Electrical Engineering (ICEE 2012), pp. 611–616. IEEE (2012)
Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). doi:10.1007/11538059_91
Chapter Google Scholar
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)
Article Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: International Joint Conference on Neural Networks (IJCNN), pp. 1322–1328 (2008)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
MATH Google Scholar
Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM SIGKDD Explor. Newsl. 6(1), 40–49 (2004)
Article Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of 14th International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann (1997)
Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Lim, P., Goh, C.K., Tan, K.C.: Evolutionary cluster-based synthetic oversampling ensemble (eco-ensemble) for imbalance learning. IEEE Trans. Cybern. 99, 1–12 (2016)
Article Google Scholar
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B 39(2), 539–550 (2009)
Article Google Scholar
Luengo, J., Fernández, A., García, S., Herrera, F.: Addressing data complexity for imbalanced data sets: analysis of smote-based oversampling and evolutionary undersampling. Soft. Comput. 15(10), 1909–1936 (2011)
Article Google Scholar
Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of smote for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111. IEEE (2011)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(October), 2825–2830 (2011)
MathSciNet MATH Google Scholar
Pérez-Ortiz, M., Gutiérrez, P.A., Tino, P., Hervás-Martínez, C.: Oversampling the minority class in the feature space. IEEE Trans. Neural Netw. Learn. Syst. 27(9), 1947–1961 (2016)
Article MathSciNet Google Scholar
Ruiz, R., Stützle, T.: A simple and effective iterated greedy algorithm for the permutation flowshop scheduling problem. Eur. J. Oper. Res. 177, 2033–2049 (2007)
Article MATH Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A: Syst. Hum. 40(1), 185–197 (2010)
Article Google Scholar
Thai-Nghe, N., Gantner, Z., Schmidt-Thieme, L.: Cost-sensitive learning methods for imbalanced data. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2010)
Google Scholar
Wang, S., Minku, L.L., Yao, X.: Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 27(5), 1356–1368 (2015)
Article Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80–83 (1945)
Article Google Scholar
Wong, G.Y., Leung, F.H., Ling, S.H.: A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets. In: Industrial Electronics Society, IECON 2013–39th Annual Conference of the IEEE, pp. 2354–2359. IEEE (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Numerical Analysis, University of Córdoba, Campus de Rabanales, C2 building, 14071, Córdoba, Spain
Francisco Javier Maestre-García, Carlos García-Martínez & Pedro Antonio Gutiérrez
Department of Quantitative Methods, Universidad Loyola Andalucía, Córdoba, Spain
María Pérez-Ortiz

Authors

Francisco Javier Maestre-García
View author publications
You can also search for this author in PubMed Google Scholar
Carlos García-Martínez
View author publications
You can also search for this author in PubMed Google Scholar
María Pérez-Ortiz
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Antonio Gutiérrez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Antonio Gutiérrez .

Editor information

Editors and Affiliations

Universidad de Granada, Granada, Spain
Ignacio Rojas
University of Malaga, Malaga, Spain
Gonzalo Joya
Polytechnic University of Catalonia, Vilanova i la Geltrú, Barcelona, Spain
Andreu Catala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maestre-García, F.J., García-Martínez, C., Pérez-Ortiz, M., Gutiérrez, P.A. (2017). An Iterated Greedy Algorithm for Improving the Generation of Synthetic Patterns in Imbalanced Learning. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2017. Lecture Notes in Computer Science(), vol 10306. Springer, Cham. https://doi.org/10.1007/978-3-319-59147-6_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-59147-6_44
Published: 18 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59146-9
Online ISBN: 978-3-319-59147-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Iterated Greedy Algorithm for Improving the Generation of Synthetic Patterns in Imbalanced Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ESMOTE: an overproduce-and-choose synthetic examples generation strategy based on evolutionary computation

A Diversity-Based Synthetic Oversampling Using Clustering for Handling Extreme Imbalance

A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Iterated Greedy Algorithm for Improving the Generation of Synthetic Patterns in Imbalanced Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ESMOTE: an overproduce-and-choose synthetic examples generation strategy based on evolutionary computation

A Diversity-Based Synthetic Oversampling Using Clustering for Handling Extreme Imbalance

A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation