Abstract
As a crucial preprocessing step in data mining, feature selection aims to obtain an excellent feature set, so as to improve the accuracy of classifiers and reduce the training time. This task is non-trivial, especially when there are missing labels in datasets. Although some semi-supervised filter feature selection methods have been proposed, they generally fall short in effectively leveraging both labeled and unlabeled information, and lack adaptability to specific datasets. This paper proposes a novel semi-supervised filter feature selection method called NM Score to overcome these shortcomings. Specifically, to calculate the NM Score of a feature, its power of locality preserving and label discrimination in the whole data space is measured via the natural Laplacian score (NLS), which is an improved parameter-free Laplacian score based on natural neighbors. Meanwhile, its correlation with the limited available label information is measured via the general and equitable maximal information coefficient (MIC). Then, NLS and MIC are combined adaptively based on conflict ratios between neighborhood and labels to determine the NM Score of a feature and hence assess its importance. Experiments are conducted based on UCI datasets and high-dimensional gene datasets, and results reveal that NM Score is more effective than several state-of-the-art methods.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-024-02246-9/MediaObjects/13042_2024_2246_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-024-02246-9/MediaObjects/13042_2024_2246_Fig2_HTML.png)
Similar content being viewed by others
Data availability
The data that support the findings of this study are available on request from the corresponding author, Jie Zeng, upon reasonable request.
References
Zhu H, Zhou M, **e Y, Albeshri A (2024) A self-adapting and efficient dandelion algorithm and its application to feature selection for credit card fraud detection. IEEE/CAA J Automat Sin 11(2):38–51
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507–514
Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recogn 64:141–158
Yang M, Chen Y-J, Ji G-L (2010) Semi_Fisher score: a semi-supervised method for feature selection. In: 2010 International Conference on Machine Learning and Cybernetics, vol. 1, pp 527–532
Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10):1842–1849
Xu J, Tang B, He H, Man H (2016) Semisupervised feature selection based on relevance and redundancy criteria. IEEE Trans Neural Netw Learn Syst 28(9):1974–1984
Pang Q-Q, Zhang L (2020) Semi-supervised neighborhood discrimination index for feature selection. Knowl-Based Syst 204:106224
Sheikhpour R, Berahmand K, Forouzandeh S (2023) Hessian-based semi-supervised feature selection using generalized uncorrelated constraint. Knowl-Based Syst 269:110521
Lai J, Chen H, Li T, Yang X (2022) Adaptive graph learning for semi-supervised feature selection with redundancy minimization. Inf Sci 609:465–488
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Comput Surv 50(6):1–45
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1):23–69
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948
Breaban M, Luchian H (2011) A unifying criterion for unsupervised clustering and feature selection. Pattern Recogn 44(4):854–865
Guo J, Zhu W (2018) Dependence guided unsupervised feature selection. Proceedings of the AAAI Conference on Artificial Intelligence 32:1
Chen X, Yuan G, Nie F, Ming Z (2020) Semi-supervised feature selection via sparse rescaled linear square regression. IEEE Trans Knowl Data Eng 32(1):165–176
Sechidis K, Brown G (2018) Simple strategies for semi-supervised feature selection. Mach Learn 107(2):357–395
Karimi F, Dowlatshahi MB, Hashemi A (2023) SemiAco: a semi-supervised feature selection based on ant colony optimization. Expert Syst Appl 214:119130
Lai J, Chen H, Li W, Li T, Wan J (2022) Semi-supervised feature selection via adaptive structure learning and constrained graph learning. Knowl-Based Syst 251:109243
Wang Y, Wang J, Liao H, Chen H (2017) An efficient semi-supervised representatives feature selection algorithm based on information theory. Pattern Recogn 61:511–523
Doquire G, Verleysen M (2013) A graph laplacian based approach to semi-supervised feature selection for regression problems. Neurocomputing 121:5–13
Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter K. Pattern Recogn Lett 80:30–36
Ding S, Du W, Xu X, Shi T, Wang Y, Li C (2023) An improved density peaks clustering algorithm based on natural neighbor with a merging strategy. Inf Sci 624:252–276
Peng H, Zhang J, Huang X, Hao Z, Li A, Yu Z, Yu PS (2024) Unsupervised social bot detection via structural information theory. ar**v preprint ar**v:2404.13595 (2024)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 62172065, Grant U22A2026, and Grant 62072097. The authors would like to thank the editor and anonymous reviewers for their valuable comments/suggestions to improve this paper.
Author information
Authors and Affiliations
Contributions
Quanwang Wu was in charge of conceptualization, methodology, validation, and writing—review & editing. Kun Cai was in charge of project administration, visualization, sorftware, formal analysis, and writing—original draft. Jianxun Sun was in charge of conceptualization, methodology, sorftware, formal analysis, investigation, data curation, visualization, and writing—original draft. Shanwei Wang was in charge of visualization and writing—original draft. Jie Zeng was in charge of resources, project administration, and writing—review & editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, Q., Cai, K., Sun, J. et al. Semi-supervised filter feature selection based on natural Laplacian score and maximal information coefficient. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02246-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13042-024-02246-9