Abstract
Real-world classification datasets often present a skewed distribution of patterns, where one or more classes are under-represented with respect to the rest. One of the most successful approaches for alleviating this problem is the generation of synthetic minority samples by convex combination of available ones. Within this framework, adaptive synthetic (ADASYN) sampling is a relatively new method which imposes weights on minority examples according to their learning complexity, in such a way that difficult examples are more prone to be over-sampled. This paper proposes an improvement of the ADASYN method, where the learning complexity of these patterns is also used to decide which sample of the neighbourhood is selected. Moreover, to avoid suboptimal results when performing the random convex combination, this paper explores the application of an iterative greedy algorithm which refines the synthetic patterns by repeatedly replacing a part of them. For the experiments, six binary datasets and four over-sampling methods are considered. The results show that the new version of ADASYN leads to more robust results and that the application of the iterative greedy metaheuristic significantly improves the quality of the generated patterns, presenting a positive effect on the final classification model.
This work has been partially subsidised by the TIN2014-54583-C2-1-R, TIN2015-70308-REDT, and TIN2014-55252-P projects of the Spanish Ministerial Commission of Science and Technology (MINECO, Spain) and FEDER funds (EU).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17(2–3), 255–287 (2010)
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01307-2_43
Chan, P.K., Fan, W., Prodromidis, A.L., Stolfo, S.J.: Distributed data mining in credit card fraud detection. IEEE Intell. Syst. Appl. 14(6), 67–74 (1999)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Cruz, R., Fernandes, K., Cardoso, J.S., Costa, J.F.P.: Tackling class imbalance with ranking. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 2182–2187. IEEE (2016)
Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM (1999)
Fernández-Caballero, J.C., Martínez-Estudillo, F.J., Hervás-Martínez, C., Gutiérrez, P.A.: Sensitivity versus accuracy in multiclass problems using memetic pareto evolutionary neural networks. IEEE Trans. Neural Netw. 21(5), 750–770 (2010)
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(4), 463–484 (2012)
García-Martínez, C., Lozano, M., Rodriguez, F.J.: Arbitrary function optimization. No free lunch and real-world problems. Soft. Comput. 16(12), 2115–2133 (2012)
García-Martínez, C., Rodriguez, F.J., Lozano, M.: Tabu-enhanced iterated greedy algorithm: a case study in the quadratic multiple knapsack problem. Eur. J. Oper. Res. 232, 454–463 (2014)
Garcia-Pedrajas, N., Pérez-Rodríguez, J., de Haro-García, A.: OligoIS: scalable instance selection for class-imbalanced data sets. IEEE Trans. Cybern. 43(1), 332–346 (2013)
Ghazikhani, A., Yazdi, H.S., Monsefi, R.: Class imbalance handling using wrapper-based random oversampling. In: 20th Iranian Conference on Electrical Engineering (ICEE 2012), pp. 611–616. IEEE (2012)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). doi:10.1007/11538059_91
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: International Joint Conference on Neural Networks (IJCNN), pp. 1322–1328 (2008)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM SIGKDD Explor. Newsl. 6(1), 40–49 (2004)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of 14th International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann (1997)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Lim, P., Goh, C.K., Tan, K.C.: Evolutionary cluster-based synthetic oversampling ensemble (eco-ensemble) for imbalance learning. IEEE Trans. Cybern. 99, 1–12 (2016)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B 39(2), 539–550 (2009)
Luengo, J., Fernández, A., García, S., Herrera, F.: Addressing data complexity for imbalanced data sets: analysis of smote-based oversampling and evolutionary undersampling. Soft. Comput. 15(10), 1909–1936 (2011)
Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of smote for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111. IEEE (2011)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(October), 2825–2830 (2011)
Pérez-Ortiz, M., Gutiérrez, P.A., Tino, P., Hervás-Martínez, C.: Oversampling the minority class in the feature space. IEEE Trans. Neural Netw. Learn. Syst. 27(9), 1947–1961 (2016)
Ruiz, R., Stützle, T.: A simple and effective iterated greedy algorithm for the permutation flowshop scheduling problem. Eur. J. Oper. Res. 177, 2033–2049 (2007)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A: Syst. Hum. 40(1), 185–197 (2010)
Thai-Nghe, N., Gantner, Z., Schmidt-Thieme, L.: Cost-sensitive learning methods for imbalanced data. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2010)
Wang, S., Minku, L.L., Yao, X.: Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 27(5), 1356–1368 (2015)
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80–83 (1945)
Wong, G.Y., Leung, F.H., Ling, S.H.: A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets. In: Industrial Electronics Society, IECON 2013–39th Annual Conference of the IEEE, pp. 2354–2359. IEEE (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Maestre-García, F.J., García-Martínez, C., Pérez-Ortiz, M., Gutiérrez, P.A. (2017). An Iterated Greedy Algorithm for Improving the Generation of Synthetic Patterns in Imbalanced Learning. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2017. Lecture Notes in Computer Science(), vol 10306. Springer, Cham. https://doi.org/10.1007/978-3-319-59147-6_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-59147-6_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59146-9
Online ISBN: 978-3-319-59147-6
eBook Packages: Computer ScienceComputer Science (R0)