Log in

Semi-supervised filter feature selection based on natural Laplacian score and maximal information coefficient

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

As a crucial preprocessing step in data mining, feature selection aims to obtain an excellent feature set, so as to improve the accuracy of classifiers and reduce the training time. This task is non-trivial, especially when there are missing labels in datasets. Although some semi-supervised filter feature selection methods have been proposed, they generally fall short in effectively leveraging both labeled and unlabeled information, and lack adaptability to specific datasets. This paper proposes a novel semi-supervised filter feature selection method called NM Score to overcome these shortcomings. Specifically, to calculate the NM Score of a feature, its power of locality preserving and label discrimination in the whole data space is measured via the natural Laplacian score (NLS), which is an improved parameter-free Laplacian score based on natural neighbors. Meanwhile, its correlation with the limited available label information is measured via the general and equitable maximal information coefficient (MIC). Then, NLS and MIC are combined adaptively based on conflict ratios between neighborhood and labels to determine the NM Score of a feature and hence assess its importance. Experiments are conducted based on UCI datasets and high-dimensional gene datasets, and results reveal that NM Score is more effective than several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The data that support the findings of this study are available on request from the corresponding author, Jie Zeng, upon reasonable request.

Notes

  1. http://archive.ics.uci.edu/ml/.

  2. https://file.biolab.si/biolab/supp/bi-cancer/projections/.

References

  1. Zhu H, Zhou M, **e Y, Albeshri A (2024) A self-adapting and efficient dandelion algorithm and its application to feature selection for credit card fraud detection. IEEE/CAA J Automat Sin 11(2):38–51

    Google Scholar 

  2. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Article  Google Scholar 

  3. Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550

    Article  Google Scholar 

  4. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507–514

    Google Scholar 

  5. Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recogn 64:141–158

    Article  Google Scholar 

  6. Yang M, Chen Y-J, Ji G-L (2010) Semi_Fisher score: a semi-supervised method for feature selection. In: 2010 International Conference on Machine Learning and Cybernetics, vol. 1, pp 527–532

  7. Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10):1842–1849

    Article  Google Scholar 

  8. Xu J, Tang B, He H, Man H (2016) Semisupervised feature selection based on relevance and redundancy criteria. IEEE Trans Neural Netw Learn Syst 28(9):1974–1984

    Article  MathSciNet  Google Scholar 

  9. Pang Q-Q, Zhang L (2020) Semi-supervised neighborhood discrimination index for feature selection. Knowl-Based Syst 204:106224

    Article  Google Scholar 

  10. Sheikhpour R, Berahmand K, Forouzandeh S (2023) Hessian-based semi-supervised feature selection using generalized uncorrelated constraint. Knowl-Based Syst 269:110521

    Article  Google Scholar 

  11. Lai J, Chen H, Li T, Yang X (2022) Adaptive graph learning for semi-supervised feature selection with redundancy minimization. Inf Sci 609:465–488

    Article  Google Scholar 

  12. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Comput Surv 50(6):1–45

    Article  Google Scholar 

  13. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford

    Book  Google Scholar 

  14. Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1):23–69

    Article  Google Scholar 

  15. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  16. Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948

    Article  Google Scholar 

  17. Breaban M, Luchian H (2011) A unifying criterion for unsupervised clustering and feature selection. Pattern Recogn 44(4):854–865

    Article  Google Scholar 

  18. Guo J, Zhu W (2018) Dependence guided unsupervised feature selection. Proceedings of the AAAI Conference on Artificial Intelligence 32:1

  19. Chen X, Yuan G, Nie F, Ming Z (2020) Semi-supervised feature selection via sparse rescaled linear square regression. IEEE Trans Knowl Data Eng 32(1):165–176

    Article  Google Scholar 

  20. Sechidis K, Brown G (2018) Simple strategies for semi-supervised feature selection. Mach Learn 107(2):357–395

    Article  MathSciNet  Google Scholar 

  21. Karimi F, Dowlatshahi MB, Hashemi A (2023) SemiAco: a semi-supervised feature selection based on ant colony optimization. Expert Syst Appl 214:119130

    Article  Google Scholar 

  22. Lai J, Chen H, Li W, Li T, Wan J (2022) Semi-supervised feature selection via adaptive structure learning and constrained graph learning. Knowl-Based Syst 251:109243

    Article  Google Scholar 

  23. Wang Y, Wang J, Liao H, Chen H (2017) An efficient semi-supervised representatives feature selection algorithm based on information theory. Pattern Recogn 61:511–523

    Article  Google Scholar 

  24. Doquire G, Verleysen M (2013) A graph laplacian based approach to semi-supervised feature selection for regression problems. Neurocomputing 121:5–13

    Article  Google Scholar 

  25. Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440

    Article  MathSciNet  Google Scholar 

  26. Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524

    Article  Google Scholar 

  27. Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter K. Pattern Recogn Lett 80:30–36

    Article  Google Scholar 

  28. Ding S, Du W, Xu X, Shi T, Wang Y, Li C (2023) An improved density peaks clustering algorithm based on natural neighbor with a merging strategy. Inf Sci 624:252–276

    Article  Google Scholar 

  29. Peng H, Zhang J, Huang X, Hao Z, Li A, Yu Z, Yu PS (2024) Unsupervised social bot detection via structural information theory. ar**v preprint ar**v:2404.13595 (2024)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62172065, Grant U22A2026, and Grant 62072097. The authors would like to thank the editor and anonymous reviewers for their valuable comments/suggestions to improve this paper.

Author information

Authors and Affiliations

Authors

Contributions

Quanwang Wu was in charge of conceptualization, methodology, validation, and writing—review & editing. Kun Cai was in charge of project administration, visualization, sorftware, formal analysis, and writing—original draft. Jianxun Sun was in charge of conceptualization, methodology, sorftware, formal analysis, investigation, data curation, visualization, and writing—original draft. Shanwei Wang was in charge of visualization and writing—original draft. Jie Zeng was in charge of resources, project administration, and writing—review & editing.

Corresponding author

Correspondence to Jie Zeng.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Q., Cai, K., Sun, J. et al. Semi-supervised filter feature selection based on natural Laplacian score and maximal information coefficient. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02246-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13042-024-02246-9

Keywords

Navigation