Semi-supervised filter feature selection based on natural Laplacian score and maximal information coefficient

Wu, Quanwang; Cai, Kun; Sun, Jianxun; Wang, Shanwei; Zeng, Jie

doi:10.1007/s13042-024-02246-9

Semi-supervised filter feature selection based on natural Laplacian score and maximal information coefficient

Original Article
Published: 15 June 2024

(2024)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Quanwang Wu¹,
Kun Cai¹,
Jianxun Sun¹,
Shanwei Wang¹ &
…
Jie Zeng¹

19 Accesses
Explore all metrics

Abstract

As a crucial preprocessing step in data mining, feature selection aims to obtain an excellent feature set, so as to improve the accuracy of classifiers and reduce the training time. This task is non-trivial, especially when there are missing labels in datasets. Although some semi-supervised filter feature selection methods have been proposed, they generally fall short in effectively leveraging both labeled and unlabeled information, and lack adaptability to specific datasets. This paper proposes a novel semi-supervised filter feature selection method called NM Score to overcome these shortcomings. Specifically, to calculate the NM Score of a feature, its power of locality preserving and label discrimination in the whole data space is measured via the natural Laplacian score (NLS), which is an improved parameter-free Laplacian score based on natural neighbors. Meanwhile, its correlation with the limited available label information is measured via the general and equitable maximal information coefficient (MIC). Then, NLS and MIC are combined adaptively based on conflict ratios between neighborhood and labels to determine the NM Score of a feature and hence assess its importance. Experiments are conducted based on UCI datasets and high-dimensional gene datasets, and results reveal that NM Score is more effective than several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel filter feature selection algorithm based on relief

Article 03 August 2021

Fast Backward Iterative Laplacian Score for Unsupervised Feature Selection

Forward Iterative Feature Selection Based on Laplacian Score

Data availability

The data that support the findings of this study are available on request from the corresponding author, Jie Zeng, upon reasonable request.

Notes

References

Zhu H, Zhou M, **e Y, Albeshri A (2024) A self-adapting and efficient dandelion algorithm and its application to feature selection for credit card fraud detection. IEEE/CAA J Automat Sin 11(2):38–51
Google Scholar
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Article Google Scholar
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550
Article Google Scholar
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507–514
Google Scholar
Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recogn 64:141–158
Article Google Scholar
Yang M, Chen Y-J, Ji G-L (2010) Semi_Fisher score: a semi-supervised method for feature selection. In: 2010 International Conference on Machine Learning and Cybernetics, vol. 1, pp 527–532
Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10):1842–1849
Article Google Scholar
Xu J, Tang B, He H, Man H (2016) Semisupervised feature selection based on relevance and redundancy criteria. IEEE Trans Neural Netw Learn Syst 28(9):1974–1984
Article MathSciNet Google Scholar
Pang Q-Q, Zhang L (2020) Semi-supervised neighborhood discrimination index for feature selection. Knowl-Based Syst 204:106224
Article Google Scholar
Sheikhpour R, Berahmand K, Forouzandeh S (2023) Hessian-based semi-supervised feature selection using generalized uncorrelated constraint. Knowl-Based Syst 269:110521
Article Google Scholar
Lai J, Chen H, Li T, Yang X (2022) Adaptive graph learning for semi-supervised feature selection with redundancy minimization. Inf Sci 609:465–488
Article Google Scholar
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Comput Surv 50(6):1–45
Article Google Scholar
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Book Google Scholar
Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1):23–69
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948
Article Google Scholar
Breaban M, Luchian H (2011) A unifying criterion for unsupervised clustering and feature selection. Pattern Recogn 44(4):854–865
Article Google Scholar
Guo J, Zhu W (2018) Dependence guided unsupervised feature selection. Proceedings of the AAAI Conference on Artificial Intelligence 32:1
Chen X, Yuan G, Nie F, Ming Z (2020) Semi-supervised feature selection via sparse rescaled linear square regression. IEEE Trans Knowl Data Eng 32(1):165–176
Article Google Scholar
Sechidis K, Brown G (2018) Simple strategies for semi-supervised feature selection. Mach Learn 107(2):357–395
Article MathSciNet Google Scholar
Karimi F, Dowlatshahi MB, Hashemi A (2023) SemiAco: a semi-supervised feature selection based on ant colony optimization. Expert Syst Appl 214:119130
Article Google Scholar
Lai J, Chen H, Li W, Li T, Wan J (2022) Semi-supervised feature selection via adaptive structure learning and constrained graph learning. Knowl-Based Syst 251:109243
Article Google Scholar
Wang Y, Wang J, Liao H, Chen H (2017) An efficient semi-supervised representatives feature selection algorithm based on information theory. Pattern Recogn 61:511–523
Article Google Scholar
Doquire G, Verleysen M (2013) A graph laplacian based approach to semi-supervised feature selection for regression problems. Neurocomputing 121:5–13
Article Google Scholar
Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440
Article MathSciNet Google Scholar
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524
Article Google Scholar
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter K. Pattern Recogn Lett 80:30–36
Article Google Scholar
Ding S, Du W, Xu X, Shi T, Wang Y, Li C (2023) An improved density peaks clustering algorithm based on natural neighbor with a merging strategy. Inf Sci 624:252–276
Article Google Scholar
Peng H, Zhang J, Huang X, Hao Z, Li A, Yu Z, Yu PS (2024) Unsupervised social bot detection via structural information theory. ar**v preprint ar**v:2404.13595 (2024)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62172065, Grant U22A2026, and Grant 62072097. The authors would like to thank the editor and anonymous reviewers for their valuable comments/suggestions to improve this paper.

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, 401331, China
Quanwang Wu, Kun Cai, Jianxun Sun, Shanwei Wang & Jie Zeng

Authors

Quanwang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kun Cai
View author publications
You can also search for this author in PubMed Google Scholar
Jianxun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shanwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Quanwang Wu was in charge of conceptualization, methodology, validation, and writing—review & editing. Kun Cai was in charge of project administration, visualization, sorftware, formal analysis, and writing—original draft. Jianxun Sun was in charge of conceptualization, methodology, sorftware, formal analysis, investigation, data curation, visualization, and writing—original draft. Shanwei Wang was in charge of visualization and writing—original draft. Jie Zeng was in charge of resources, project administration, and writing—review & editing.

Corresponding author

Correspondence to Jie Zeng.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, Q., Cai, K., Sun, J. et al. Semi-supervised filter feature selection based on natural Laplacian score and maximal information coefficient. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02246-9

Download citation

Received: 27 October 2023
Accepted: 04 June 2024
Published: 15 June 2024
DOI: https://doi.org/10.1007/s13042-024-02246-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-supervised filter feature selection based on natural Laplacian score and maximal information coefficient

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel filter feature selection algorithm based on relief

Fast Backward Iterative Laplacian Score for Unsupervised Feature Selection

Forward Iterative Feature Selection Based on Laplacian Score

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Semi-supervised filter feature selection based on natural Laplacian score and maximal information coefficient

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel filter feature selection algorithm based on relief

Fast Backward Iterative Laplacian Score for Unsupervised Feature Selection

Forward Iterative Feature Selection Based on Laplacian Score

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation