Log in

Improved aquila optimizer with mRMR for feature selection of high-dimensional gene expression data

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Accurate classification of gene expression data is crucial for disease diagnosis and drug discovery. However, gene expression data usually has a large number of features, which poses a challenge for accurate classification. In this paper, a novel feature selection method based on minimal redundancy maximal relevance (mRMR) and aquila optimizer is proposed, which introduces the mRMR method in the initialization stage of the population to generate excellent initial populations, effectively improve the quality of the population, and then, the using random opposition-based learning strategy to improve the diversity of aquila population and accelerate the convergence speed of the algorithm, and finally, introducing inertia weight in the position update formula in the late iteration of the aquila optimizer to avoid the algorithm falling into the local optimum and improve the algorithm’s capability to find the optimum. In order to verify the effectiveness of the proposed method, ten real gene expression datasets are selected in this paper and compared with several meta-heuristic algorithms. Experimental results show that the proposed method is significantly superior to other meta-heuristic algorithms in terms of fitness value, classification accuracy and the number of selected features. Compared with the original aquila optimizer, the average classification accuracy of the proposed method on KNN and SVM classifiers is improved by 3.48–12.41% and 0.53–18.63% respectively. The proposed method significantly reduces the feature dimension of gene expression data, retains important features, and obtains higher classification accuracy, providing a new method and idea for feature selection of gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (France)

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Algorithm 2
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

Data will be made available on request.

References

  1. Lee, K., Man, Z., Wang, D., Cao, Z.: Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis. Neural Comput. Appl. 22(3–4), 457–468 (2013)

    Article  Google Scholar 

  2. Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2015)

    Article  Google Scholar 

  3. Gunavathi, C., Premalatha, K.: Performance analysis of genetic algorithm with kNN and SVM for feature selection in tumor classification. International Journal of Computer and Information Engineering 8(8), 1490–1497 (2014)

    Google Scholar 

  4. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Article  Google Scholar 

  5. Lyu, H., Wan, M., Han, J., Liu, R., Wang, C.: A filter feature selection method based on the maximal information coefficient and gram-schmidt orthogonalization for biomedical data mining. Comput. Biol. Med. 89, 264–274 (2017)

    Article  Google Scholar 

  6. Raileanu, L.E., Stoffel, K.: Theoretical comparison between the gini index and information gain criteria. Ann. Math. Artif. Intell. 41, 77–93 (2004)

    Article  MathSciNet  Google Scholar 

  7. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  8. Kononenko I (1994). Estimating attributes: Analysis and extensions of RELIEF. In European conference on machine learning (pp. 171–182). Berlin, Heidelberg: Springer Berlin Heidelberg.

  9. Mandal, M., Mukhopadhyay, A.: An improved minimum redundancy maximum relevance approach for feature selection in gene expression data. Procedia Technol. 10, 20–27 (2013)

    Article  Google Scholar 

  10. Ke, W., Wu, C., Wu, Y., **ong, N.N.: A new filter feature selection based on criteria fusion for gene microarray data. IEEE Access 6, 61065–61076 (2018)

    Article  Google Scholar 

  11. Raj, D.D., Mohanasundaram, R.: An efficient filter-based feature selection model to identify significant features from high-dimensional microarray data. Arab. J. Sci. Eng. 45, 2619–2630 (2020)

    Article  Google Scholar 

  12. Ghosh, K.K., Begum, S., Sardar, A., Adhikary, S., Ghosh, M., Kumar, M., Sarkar, R.: Theoretical and empirical analysis of filter ranking methods: experimental study on benchmark DNA microarray data. Expert Syst. Appl. 169, 114485 (2021)

    Article  Google Scholar 

  13. Saberi-Movahed, F., Rostami, M., Berahmand, K., Karami, S., Tiwari, P., Oussalah, M., Band, S.S.: Dual regularized unsupervised feature selection based on matrix factorization and minimum redundancy with application in gene selection. Knowl.-Based Syst. 256, 109884 (2022)

    Article  Google Scholar 

  14. Alhenawi, E.A., Al-Sayyed, R., Hudaib, A., Mirjalili, S.: Feature selection methods on gene expression microarray data for cancer classification: a systematic review. Comput. Biol. Med. 140, 105051 (2022)

    Article  Google Scholar 

  15. Li, L., Liu, Z.P.: A connected network-regularized logistic regression model for feature selection. Appl. Intell. 52(10), 11672–11702 (2022)

    Article  Google Scholar 

  16. Zhong, Y., Chalise, P., He, J.: Nested cross-validation with ensemble feature selection and classification model for high-dimensional biological data. Communications in statistics-simulation and computation 52(1), 110–125 (2023)

    Article  MathSciNet  Google Scholar 

  17. Abu Khurma, R., Aljarah, I., Sharieh, A., AbdElaziz, M., Damaševičius, R., Krilavičius, T.: A review of the modification strategies of the nature inspired algorithms for feature selection problem. Mathematics 10(3), 464 (2022)

    Article  Google Scholar 

  18. Huerta EB, Duval B, Hao JK (2006) A hybrid GA/SVM approach for gene selection and classification of microarray data. In Applications of Evolutionary Computing: EvoWorkshops 2006: EvoBIO, EvoCOMNET, EvoHOT, EvoIASP, EvoINTERACTION, EvoMUSART, and EvoSTOC, April 10–12, 2006. Proceedings (pp. 34–44). Springer Berlin Heidelberg, Budapest, Hungary.

  19. Sahoo, A., Chandra, S.: Multi-objective grey wolf optimizer for improved cervix lesion classification. Appl. Soft Comput. 52, 64–80 (2017)

    Article  Google Scholar 

  20. Nouri-Moghaddam, B., Ghazanfari, M., Fathian, M.: A novel multi-objective forest optimization algorithm for wrapper feature selection. Expert Syst. Appl. 175, 114737 (2021)

    Article  Google Scholar 

  21. Alzaqebah, M., Briki, K., Alrefai, N., Brini, S., Jawarneh, S., Alsmadi, M.K., Alqahtani, A.: Memory based cuckoo search algorithm for feature selection of gene expression dataset. Informatics in Medicine Unlocked 24, 100572 (2021)

    Article  Google Scholar 

  22. Balakrishnan, K., Dhanalakshmi, R.: Feature selection in high-dimensional microarray cancer datasets using an improved equilibrium optimization approach. Concurrency and Computation: Practice and Experience 34(28), e7381 (2022)

    Article  Google Scholar 

  23. Pashaei, E., Pashaei, E.: An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput. Appl. 34(8), 6427–6451 (2022)

    Article  Google Scholar 

  24. Alshamlan, H., Badr, G., Alohali, Y.: mRMR-ABC: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed. Res. Int. 2015(2015), 604910 (2015)

    Google Scholar 

  25. Lu, H., Chen, J., Yan, K., **, Q., Xue, Y., Gao, Z.: A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256, 56–62 (2017)

    Article  Google Scholar 

  26. Zheng, Y., Li, Y., Wang, G., Chen, Y., Xu, Q., Fan, J., Cui, X.: A hybrid feature selection algorithm for microarray data. J. Supercomput. 76, 3494–3526 (2020)

    Article  Google Scholar 

  27. Pashaei, E., Pashaei, E., Aydin, N.: Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization. Genomics 111(4), 669–686 (2019)

    Article  Google Scholar 

  28. Shukla, A.K., Singh, P., Vardhan, M.: Gene selection for cancer types classification using novel hybrid metaheuristics approach. Swarm Evol. Comput. 54, 100661 (2020)

    Article  Google Scholar 

  29. Yan, C., Wu, B., Ma, J., Zhang, G., Luo, J., Wang, J., Luo, H.: A novel hybrid filter/wrapper feature selection approach based on improved fruit fly optimization algorithm and chi-square test for high dimensional microarray data. Curr. Bioinform. 16(1), 63–79 (2021)

    Article  Google Scholar 

  30. Guo, X., Hu, J., Yu, H., Wang, M., Yang, B.: A new population initialization of metaheuristic algorithms based on hybrid fuzzy rough set for high-dimensional gene data feature selection. Comput. Biol. Med. 166, 107538 (2023)

    Article  Google Scholar 

  31. Pan, H., Chen, S., **ong, H.: A high-dimensional feature selection method based on modified gray wolf optimization. Appl. Soft Comput. 135, 110031 (2023)

    Article  Google Scholar 

  32. Sadeghian, Z., Akbari, E., Nematzadeh, H.: A hybrid feature selection method based on information theory and binary butterfly optimization algorithm. Eng. Appl. Artif. Intell. 97, 104079 (2021)

    Article  Google Scholar 

  33. Ganjei, M.A., Boostani, R.: A hybrid feature selection scheme for high-dimensional data. Eng. Appl. Artif. Intell. 113, 104894 (2022)

    Article  Google Scholar 

  34. **e, W., Wang, L., Yu, K., Shi, T., Li, W.: Improved multi-layer binary firefly algorithm for optimizing feature selection and classification of microarray data. Biomed. Signal Process. Control 79, 104080 (2023)

    Article  Google Scholar 

  35. Abualigah, L., Yousri, D., AbdElaziz, M., Ewees, A.A., Al-Qaness, M.A., Gandomi, A.H.: Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput. Ind. Eng. 157, 107250 (2021)

    Article  Google Scholar 

  36. Kandan, M., Krishnamurthy, A., Selvi, S.A.M., Sikkandar, M.Y., Aboamer, M.A., Tamilvizhi, T.: Quasi oppositional aquila optimizer-based task scheduling approach in an IoT enabled cloud environment. J. Supercomput. 78(7), 10176–10190 (2022)

    Article  Google Scholar 

  37. Jamazi, C., Manita, G., Chhabra, A., Manita, H., Korbaa, O.: Mutated Aquila optimizer for assisting brain tumor segmentation. Biomed. Signal Process. Control 88, 105089 (2024)

    Article  Google Scholar 

  38. Ait-Saadi, A., Meraihi, Y., Soukane, A., Ramdane-Cherif, A., Gabis, A.B.: A novel hybrid chaotic Aquila optimization algorithm with simulated annealing for unmanned aerial vehicles path planning. Comput. Electr. Eng. 104, 108461 (2022)

    Article  Google Scholar 

  39. Pashaei, E.: Mutation-based binary Aquila optimizer for gene selection in cancer classification. Comput. Biol. Chem. 101, 107767 (2022)

    Article  Google Scholar 

  40. Nadimi-Shahraki, M.H., Taghian, S., Mirjalili, S., Abualigah, L.: Binary aquila optimizer for selecting effective features from medical data: A COVID-19 case study. Mathematics 10(11), 1929 (2022)

    Article  Google Scholar 

  41. Abd El-Mageed, A.A., Abohany, A.A., Elashry, A.: Effective Feature selection strategy for supervised classification based on an improved binary Aquila optimization algorithm. Comput. Ind. Eng. 181, 109300 (2023)

    Article  Google Scholar 

  42. Long, W., Jiao, J., Liang, X., Cai, S., Xu, M.: A random opposition-based learning grey wolf optimizer. IEEE Access 7, 113810–113825 (2019)

    Article  Google Scholar 

  43. Balakrishnan, K., Dhanalakshmi, R., MahadeoKhaire, U.: Excogitating marine predators algorithm based on random opposition-based learning for feature selection. Concurrency and Computation: Practice and Experience 34(4), e6630 (2022)

    Article  Google Scholar 

  44. Ali, M.A.S., FathimathulRajeena, P.P., AbdElminaam, D.S.: A feature selection based on improved artificial hummingbird algorithm using random opposition-based learning for solving waste classification problem. Mathematics 10(15), 2675 (2022)

    Article  Google Scholar 

  45. Tizhoosh HR (2005) Opposition-based learning: a new scheme for machine intelligence. In International conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC’06) (Vol. 1, pp. 695–701). IEEE.

  46. Sun, P., Liu, H., Zhang, Y., Tu, L., Meng, Q.: An intensify atom search optimization for engineering design problems. Appl. Math. Model. 89, 837–859 (2021)

    Article  Google Scholar 

  47. Wu, D., Yuan, C.: Threshold image segmentation based on improved sparrow search algorithm. Multimedia Tools and Applications 81(23), 33513–33546 (2022)

    Article  Google Scholar 

  48. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Computing Surveys (CSUR) 50, 1–45 (2017)

    Google Scholar 

  49. Zhu, Z., Ong, Y.S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. 40(11), 3236–3248 (2007)

    Article  Google Scholar 

  50. Kennedy J, Eberhart R. (1995) Particle swarm optimization. In Proceedings of ICNN’95-international conference on neural networks (Vol. 4, pp. 1942–1948). IEEE.

  51. Mirjalili, S., Mirjalili, S.M., Lewis, A.: Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014)

    Article  Google Scholar 

  52. Storn, R., Price, K.: Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11, 341–359 (1997)

    Article  MathSciNet  Google Scholar 

  53. Rao, R.: Jaya: a simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int. J. Ind. Eng. Comput. 7(1), 19–34 (2016)

    Google Scholar 

  54. Mirjalili, S., Gandomi, A.H., Mirjalili, S.Z., Saremi, S., Faris, H., Mirjalili, S.M.: Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv. Eng. Softw. 114, 163–191 (2017)

    Article  Google Scholar 

  55. Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)

    Article  Google Scholar 

  56. Hashim, F.A., Houssein, E.H., Mostafa, R.R., Hussien, A.G., Helmy, F.: An efficient adaptive-mutated coati optimization algorithm for feature selection and global optimization. Alex. Eng. J. 85, 29–48 (2023)

    Article  Google Scholar 

  57. Askr, H., Abdel-Salam, M., Hassanien, A.E.: Copula entropy-based golden jackal optimization algorithm for high-dimensional feature selection problems. Expert Syst. Appl. 238, 121582 (2024)

    Article  Google Scholar 

  58. Cheng, F., Zhang, R., Huang, Z., Qiu, J., **a, M., Zhang, L.: An Objective space constraint-based evolutionary method for high-dimensional feature selection [research frontier]. IEEE Comput. Intell. Mag. 19(2), 113–128 (2024)

    Article  Google Scholar 

  59. Li, J., Fong, S., Wong, R.K., Millham, R., Wong, K.K.: Elitist binary wolf search algorithm for heuristic feature selection in high-dimensional bioinformatics datasets. Sci. Rep. 7(1), 4354 (2017)

    Article  Google Scholar 

  60. Chen, K., Xue, B., Zhang, M., Zhou, F.: Evolutionary multitasking for feature selection in high-dimensional classification via particle swarm optimization. IEEE Trans. Evol. Comput. 26(3), 446–460 (2021)

    Article  Google Scholar 

  61. Song, X., Zhang, Y., Gong, D., Liu, H., Zhang, W.: Surrogate Sample-Assisted Particle Swarm Optimization for Feature Selection on High-Dimensional Data. IEEE Trans. Evol. Comput. 27(3), 595-609 (2023).

  62. Zhang, G., Yu, P., Wang, J., Yan, C.: Feature selection algorithm for high-dimensional biomedical data using information gain and improved chemical reaction optimization. Curr. Bioinform. 15(8), 912–926 (2020)

    Article  Google Scholar 

Download references

Funding

This work was supported by Department of Science and Technology of Jilin Province project (20210101149JC, 20200403182SF); Education Department of Jilin Province project (JJKH20220662KJ).

Author information

Authors and Affiliations

Authors

Contributions

** Yuan: Software, Writing –review & editing.

Corresponding author

Correspondence to Siqi Zhang.

Ethics declarations

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, X., Zhang, S., Dong, X. et al. Improved aquila optimizer with mRMR for feature selection of high-dimensional gene expression data. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04614-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-024-04614-0

Keywords

Navigation