Abstract
Accurate classification of gene expression data is crucial for disease diagnosis and drug discovery. However, gene expression data usually has a large number of features, which poses a challenge for accurate classification. In this paper, a novel feature selection method based on minimal redundancy maximal relevance (mRMR) and aquila optimizer is proposed, which introduces the mRMR method in the initialization stage of the population to generate excellent initial populations, effectively improve the quality of the population, and then, the using random opposition-based learning strategy to improve the diversity of aquila population and accelerate the convergence speed of the algorithm, and finally, introducing inertia weight in the position update formula in the late iteration of the aquila optimizer to avoid the algorithm falling into the local optimum and improve the algorithm’s capability to find the optimum. In order to verify the effectiveness of the proposed method, ten real gene expression datasets are selected in this paper and compared with several meta-heuristic algorithms. Experimental results show that the proposed method is significantly superior to other meta-heuristic algorithms in terms of fitness value, classification accuracy and the number of selected features. Compared with the original aquila optimizer, the average classification accuracy of the proposed method on KNN and SVM classifiers is improved by 3.48–12.41% and 0.53–18.63% respectively. The proposed method significantly reduces the feature dimension of gene expression data, retains important features, and obtains higher classification accuracy, providing a new method and idea for feature selection of gene expression data.
Similar content being viewed by others
Data availability
Data will be made available on request.
References
Lee, K., Man, Z., Wang, D., Cao, Z.: Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis. Neural Comput. Appl. 22(3–4), 457–468 (2013)
Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2015)
Gunavathi, C., Premalatha, K.: Performance analysis of genetic algorithm with kNN and SVM for feature selection in tumor classification. International Journal of Computer and Information Engineering 8(8), 1490–1497 (2014)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Lyu, H., Wan, M., Han, J., Liu, R., Wang, C.: A filter feature selection method based on the maximal information coefficient and gram-schmidt orthogonalization for biomedical data mining. Comput. Biol. Med. 89, 264–274 (2017)
Raileanu, L.E., Stoffel, K.: Theoretical comparison between the gini index and information gain criteria. Ann. Math. Artif. Intell. 41, 77–93 (2004)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Kononenko I (1994). Estimating attributes: Analysis and extensions of RELIEF. In European conference on machine learning (pp. 171–182). Berlin, Heidelberg: Springer Berlin Heidelberg.
Mandal, M., Mukhopadhyay, A.: An improved minimum redundancy maximum relevance approach for feature selection in gene expression data. Procedia Technol. 10, 20–27 (2013)
Ke, W., Wu, C., Wu, Y., **ong, N.N.: A new filter feature selection based on criteria fusion for gene microarray data. IEEE Access 6, 61065–61076 (2018)
Raj, D.D., Mohanasundaram, R.: An efficient filter-based feature selection model to identify significant features from high-dimensional microarray data. Arab. J. Sci. Eng. 45, 2619–2630 (2020)
Ghosh, K.K., Begum, S., Sardar, A., Adhikary, S., Ghosh, M., Kumar, M., Sarkar, R.: Theoretical and empirical analysis of filter ranking methods: experimental study on benchmark DNA microarray data. Expert Syst. Appl. 169, 114485 (2021)
Saberi-Movahed, F., Rostami, M., Berahmand, K., Karami, S., Tiwari, P., Oussalah, M., Band, S.S.: Dual regularized unsupervised feature selection based on matrix factorization and minimum redundancy with application in gene selection. Knowl.-Based Syst. 256, 109884 (2022)
Alhenawi, E.A., Al-Sayyed, R., Hudaib, A., Mirjalili, S.: Feature selection methods on gene expression microarray data for cancer classification: a systematic review. Comput. Biol. Med. 140, 105051 (2022)
Li, L., Liu, Z.P.: A connected network-regularized logistic regression model for feature selection. Appl. Intell. 52(10), 11672–11702 (2022)
Zhong, Y., Chalise, P., He, J.: Nested cross-validation with ensemble feature selection and classification model for high-dimensional biological data. Communications in statistics-simulation and computation 52(1), 110–125 (2023)
Abu Khurma, R., Aljarah, I., Sharieh, A., AbdElaziz, M., Damaševičius, R., Krilavičius, T.: A review of the modification strategies of the nature inspired algorithms for feature selection problem. Mathematics 10(3), 464 (2022)
Huerta EB, Duval B, Hao JK (2006) A hybrid GA/SVM approach for gene selection and classification of microarray data. In Applications of Evolutionary Computing: EvoWorkshops 2006: EvoBIO, EvoCOMNET, EvoHOT, EvoIASP, EvoINTERACTION, EvoMUSART, and EvoSTOC, April 10–12, 2006. Proceedings (pp. 34–44). Springer Berlin Heidelberg, Budapest, Hungary.
Sahoo, A., Chandra, S.: Multi-objective grey wolf optimizer for improved cervix lesion classification. Appl. Soft Comput. 52, 64–80 (2017)
Nouri-Moghaddam, B., Ghazanfari, M., Fathian, M.: A novel multi-objective forest optimization algorithm for wrapper feature selection. Expert Syst. Appl. 175, 114737 (2021)
Alzaqebah, M., Briki, K., Alrefai, N., Brini, S., Jawarneh, S., Alsmadi, M.K., Alqahtani, A.: Memory based cuckoo search algorithm for feature selection of gene expression dataset. Informatics in Medicine Unlocked 24, 100572 (2021)
Balakrishnan, K., Dhanalakshmi, R.: Feature selection in high-dimensional microarray cancer datasets using an improved equilibrium optimization approach. Concurrency and Computation: Practice and Experience 34(28), e7381 (2022)
Pashaei, E., Pashaei, E.: An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput. Appl. 34(8), 6427–6451 (2022)
Alshamlan, H., Badr, G., Alohali, Y.: mRMR-ABC: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed. Res. Int. 2015(2015), 604910 (2015)
Lu, H., Chen, J., Yan, K., **, Q., Xue, Y., Gao, Z.: A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256, 56–62 (2017)
Zheng, Y., Li, Y., Wang, G., Chen, Y., Xu, Q., Fan, J., Cui, X.: A hybrid feature selection algorithm for microarray data. J. Supercomput. 76, 3494–3526 (2020)
Pashaei, E., Pashaei, E., Aydin, N.: Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization. Genomics 111(4), 669–686 (2019)
Shukla, A.K., Singh, P., Vardhan, M.: Gene selection for cancer types classification using novel hybrid metaheuristics approach. Swarm Evol. Comput. 54, 100661 (2020)
Yan, C., Wu, B., Ma, J., Zhang, G., Luo, J., Wang, J., Luo, H.: A novel hybrid filter/wrapper feature selection approach based on improved fruit fly optimization algorithm and chi-square test for high dimensional microarray data. Curr. Bioinform. 16(1), 63–79 (2021)
Guo, X., Hu, J., Yu, H., Wang, M., Yang, B.: A new population initialization of metaheuristic algorithms based on hybrid fuzzy rough set for high-dimensional gene data feature selection. Comput. Biol. Med. 166, 107538 (2023)
Pan, H., Chen, S., **ong, H.: A high-dimensional feature selection method based on modified gray wolf optimization. Appl. Soft Comput. 135, 110031 (2023)
Sadeghian, Z., Akbari, E., Nematzadeh, H.: A hybrid feature selection method based on information theory and binary butterfly optimization algorithm. Eng. Appl. Artif. Intell. 97, 104079 (2021)
Ganjei, M.A., Boostani, R.: A hybrid feature selection scheme for high-dimensional data. Eng. Appl. Artif. Intell. 113, 104894 (2022)
**e, W., Wang, L., Yu, K., Shi, T., Li, W.: Improved multi-layer binary firefly algorithm for optimizing feature selection and classification of microarray data. Biomed. Signal Process. Control 79, 104080 (2023)
Abualigah, L., Yousri, D., AbdElaziz, M., Ewees, A.A., Al-Qaness, M.A., Gandomi, A.H.: Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput. Ind. Eng. 157, 107250 (2021)
Kandan, M., Krishnamurthy, A., Selvi, S.A.M., Sikkandar, M.Y., Aboamer, M.A., Tamilvizhi, T.: Quasi oppositional aquila optimizer-based task scheduling approach in an IoT enabled cloud environment. J. Supercomput. 78(7), 10176–10190 (2022)
Jamazi, C., Manita, G., Chhabra, A., Manita, H., Korbaa, O.: Mutated Aquila optimizer for assisting brain tumor segmentation. Biomed. Signal Process. Control 88, 105089 (2024)
Ait-Saadi, A., Meraihi, Y., Soukane, A., Ramdane-Cherif, A., Gabis, A.B.: A novel hybrid chaotic Aquila optimization algorithm with simulated annealing for unmanned aerial vehicles path planning. Comput. Electr. Eng. 104, 108461 (2022)
Pashaei, E.: Mutation-based binary Aquila optimizer for gene selection in cancer classification. Comput. Biol. Chem. 101, 107767 (2022)
Nadimi-Shahraki, M.H., Taghian, S., Mirjalili, S., Abualigah, L.: Binary aquila optimizer for selecting effective features from medical data: A COVID-19 case study. Mathematics 10(11), 1929 (2022)
Abd El-Mageed, A.A., Abohany, A.A., Elashry, A.: Effective Feature selection strategy for supervised classification based on an improved binary Aquila optimization algorithm. Comput. Ind. Eng. 181, 109300 (2023)
Long, W., Jiao, J., Liang, X., Cai, S., Xu, M.: A random opposition-based learning grey wolf optimizer. IEEE Access 7, 113810–113825 (2019)
Balakrishnan, K., Dhanalakshmi, R., MahadeoKhaire, U.: Excogitating marine predators algorithm based on random opposition-based learning for feature selection. Concurrency and Computation: Practice and Experience 34(4), e6630 (2022)
Ali, M.A.S., FathimathulRajeena, P.P., AbdElminaam, D.S.: A feature selection based on improved artificial hummingbird algorithm using random opposition-based learning for solving waste classification problem. Mathematics 10(15), 2675 (2022)
Tizhoosh HR (2005) Opposition-based learning: a new scheme for machine intelligence. In International conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC’06) (Vol. 1, pp. 695–701). IEEE.
Sun, P., Liu, H., Zhang, Y., Tu, L., Meng, Q.: An intensify atom search optimization for engineering design problems. Appl. Math. Model. 89, 837–859 (2021)
Wu, D., Yuan, C.: Threshold image segmentation based on improved sparrow search algorithm. Multimedia Tools and Applications 81(23), 33513–33546 (2022)
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Computing Surveys (CSUR) 50, 1–45 (2017)
Zhu, Z., Ong, Y.S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. 40(11), 3236–3248 (2007)
Kennedy J, Eberhart R. (1995) Particle swarm optimization. In Proceedings of ICNN’95-international conference on neural networks (Vol. 4, pp. 1942–1948). IEEE.
Mirjalili, S., Mirjalili, S.M., Lewis, A.: Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014)
Storn, R., Price, K.: Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11, 341–359 (1997)
Rao, R.: Jaya: a simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int. J. Ind. Eng. Comput. 7(1), 19–34 (2016)
Mirjalili, S., Gandomi, A.H., Mirjalili, S.Z., Saremi, S., Faris, H., Mirjalili, S.M.: Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv. Eng. Softw. 114, 163–191 (2017)
Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)
Hashim, F.A., Houssein, E.H., Mostafa, R.R., Hussien, A.G., Helmy, F.: An efficient adaptive-mutated coati optimization algorithm for feature selection and global optimization. Alex. Eng. J. 85, 29–48 (2023)
Askr, H., Abdel-Salam, M., Hassanien, A.E.: Copula entropy-based golden jackal optimization algorithm for high-dimensional feature selection problems. Expert Syst. Appl. 238, 121582 (2024)
Cheng, F., Zhang, R., Huang, Z., Qiu, J., **a, M., Zhang, L.: An Objective space constraint-based evolutionary method for high-dimensional feature selection [research frontier]. IEEE Comput. Intell. Mag. 19(2), 113–128 (2024)
Li, J., Fong, S., Wong, R.K., Millham, R., Wong, K.K.: Elitist binary wolf search algorithm for heuristic feature selection in high-dimensional bioinformatics datasets. Sci. Rep. 7(1), 4354 (2017)
Chen, K., Xue, B., Zhang, M., Zhou, F.: Evolutionary multitasking for feature selection in high-dimensional classification via particle swarm optimization. IEEE Trans. Evol. Comput. 26(3), 446–460 (2021)
Song, X., Zhang, Y., Gong, D., Liu, H., Zhang, W.: Surrogate Sample-Assisted Particle Swarm Optimization for Feature Selection on High-Dimensional Data. IEEE Trans. Evol. Comput. 27(3), 595-609 (2023).
Zhang, G., Yu, P., Wang, J., Yan, C.: Feature selection algorithm for high-dimensional biomedical data using information gain and improved chemical reaction optimization. Curr. Bioinform. 15(8), 912–926 (2020)
Funding
This work was supported by Department of Science and Technology of Jilin Province project (20210101149JC, 20200403182SF); Education Department of Jilin Province project (JJKH20220662KJ).
Author information
Authors and Affiliations
Contributions
** Yuan: Software, Writing –review & editing.
Corresponding author
Ethics declarations
Competing interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qin, X., Zhang, S., Dong, X. et al. Improved aquila optimizer with mRMR for feature selection of high-dimensional gene expression data. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04614-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10586-024-04614-0