Semi-supervised feature selection based on discernibility matrix and mutual information

Qian, Wenbin; Wan, Lijuan; Shu, Wenhao

doi:10.1007/s10489-024-05481-3

Semi-supervised feature selection based on discernibility matrix and mutual information

Published: 03 June 2024

Volume 54, pages 7278–7295, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Wenbin Qian¹,
Lijuan Wan¹ &
Wenhao Shu²

80 Accesses
Explore all metrics

Abstract

Feature selection is a vital technique for reducing data dimensionality. While many granular computing-based feature selection algorithms have been proposed, most have been regarded as a supervised learning task requiring a large number of labeled instances. However, obtaining sufficient labeled data is expensive and time-consuming. To address this limitation, a novel semi-supervised feature selection framework is developed by leveraging both labeled and unlabeled data. Specifically, the discernibility matrix is used to measure feature relevance on the labeled data. Moreover, mutual information is employed to evaluate the feature significance on the unlabeled data. By combining these supervised and unsupervised metrics, a greedy feature selection algorithm is proposed for the semi-supervised learning scenarios. The proposed discernibility matrix and mutual information-based feature measurement can select more discriminative features to improve the generalization performance of learning model. Finally, experiments conducted on ten UCI semi-supervised datasets demonstrate that the proposed approach achieves superior performance over five state-of-the-art semi-supervised feature selection methods.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Semi-supervised feature selection with minimal redundancy based on local adaptive

Article 07 April 2021

Information gain-based semi-supervised feature selection for hybrid data

Article 18 July 2022

Semi-supervised Feature Selection Based on Cost-Sensitive and Structural Information

Data availability and access

The data that support the findings of this study are openly available in the UCI machine learning repository at http://archive.ics.uci.edu/ml.

References

Ky Mikalsen, Soguero-Ruiz C, Bianchi FM et al (2019) Noisy multi-label semi-supervised dimensionality reduction. Pattern Recognition 90:257–270
Article Google Scholar
Wang F, Zhu L, **e L et al (2021) Label propagation with structured graph learning for semi-supervised dimension reduction. Knowl-Based Syst 225:107130
Article Google Scholar
Peralta D, Saeys Y (2020) Robust unsupervised dimensionality reduction based on feature clustering for single-cell imaging data. Appl Soft Comput 93:106421
Article Google Scholar
Miao J, Yang T, Sun L et al (2022) Graph regularized locally linear embedding for unsupervised feature selection. Pattern Recognition 122:108299
Article Google Scholar
Chen H, Chen H, Li W et al (2022) Robust dual-graph regularized and minimum redundancy based on self-representation for semi-supervised feature selection. Neurocomputing 490:104–123
Article Google Scholar
Xue Y, Zhu H, Liang J et al (2021) Adaptive crossover operator based multi-objective binary genetic algorithm for feature selection in classification. Knowl-Based Syst 227:107218
Article Google Scholar
Dong H, Sun J, Sun X et al (2020) A many-objective feature selection for multi-label classification. Knowl-Based Syst 208:106456
Article Google Scholar
Lin Z, Luo M, Peng Z et al (2020) Nonlinear feature selection on attributed networks. Neurocomputing 410:161–173
Article Google Scholar
Song Z, Yang X, Xu Z et al (2022) Graph-based semi-supervised learning: A comprehensive review. IEEE Trans Neural Netw Learn Syst
Li X, Zhao H, Yu L et al (2022) Feature extraction using parameterized multisynchrosqueezing transform. IEEE Sensors J 22(14):14263–14272
Article Google Scholar
Sarkar JP, Saha I, Chakraborty S et al (2020) Machine learning integrated credibilistic semi supervised clustering for categorical data. Appl Soft Comput 86:105871
Article Google Scholar
Wu F, **g XY, Wei P et al (2022) Semi-supervised multi-view graph convolutional networks with application to webpage classification. Inf Sci 591:142–154
Article Google Scholar
Sun Y, Ding S, Guo L et al (2022) Hypergraph regularized semi-supervised support vector machine. Inf Sci 591:400–421
Article Google Scholar
Lv S, Shi S, Wang H et al (2021) Semi-supervised multi-label feature selection with adaptive structure learning and manifold learning. Knowl-Based Syst 214:106757
Article Google Scholar
Sevilla-Salcedo C, Gomez-Verdejo V, Olmos PM (2021) Sparse semi-supervised heterogeneous interbattery bayesian analysis. Pattern Recognition 120:108141
Article Google Scholar
Wang J, Liang J, Cui J et al (2021) Semi-supervised learning with mixed-order graph convolutional networks. Inf Sci 573:171–181
Article MathSciNet Google Scholar
Fan Y, Liu J, Wu S (2022) Exploring instance correlations with local discriminant model for multi-label feature selection. Appl Intell pp 1–19
Liang N, Yang Z, Li Z et al (2021) Semi-supervised multi-view learning by using label propagation based non-negative matrix factorization. Knowl-Based Syst 228:107244
Article Google Scholar
Malhotra A, Schizas ID (2020) On unsupervised simultaneous kernel learning and data clustering. Pattern Recognition 108:107518
Article Google Scholar
Ren Z, Yan J, Yang X et al (2020) Unsupervised learning of optical flow with patch consistency and occlusion estimation. Pattern Recognition 103:107191
Article Google Scholar
Liu K, Yang X, Yu H et al (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowl-Based Syst 165:282–296
Article Google Scholar
Tang B, Zhang L (2020) Local preserving logistic i-relief for semi-supervised feature selection. Neurocomputing 399:48–64
Article Google Scholar
Dai J, Liu Q (2022) Semi-supervised attribute reduction for interval data based on misclassification cost. Int J Machine Learn Cybernetics pp 1–12
Jia X, **g XY, Zhu X et al (2020) Semi-supervised multi-view deep discriminant representation learning. IEEE Trans Pattern Anal Mach Intell 43(7):2496–2509
Article Google Scholar
Zhong W, Chen X, Nie F et al (2021) Adaptive discriminant analysis for semi-supervised feature selection. Inf Sci 566:178–194
Article MathSciNet Google Scholar
Nie F, Wang Z, Wang R et al (2021) Adaptive local embedding learning for semi-supervised dimensionality reduction. IEEE Trans Knowl Data Eng 34(10):4609–4621
Article Google Scholar
Qian W, Huang J, Wang Y et al (2020) Mutual information-based label distribution feature selection for multi-label learning. Knowl-Based Syst 195:105684
Article Google Scholar
Lall S, Sinha D, Ghosh A et al (2021) Stable feature selection using copula based mutual information. Pattern Recognition 112:107697
Article Google Scholar
Sheikhpour R, Sarram MA, Gharaghani S et al (2020) A robust graph-based semi-supervised sparse feature selection method. Inf Sci 531:13–30
Article MathSciNet Google Scholar
Pang QQ, Zhang L (2020) Semi-supervised neighborhood discrimination index for feature selection. Knowl-Based Syst 204:106224
Article Google Scholar
Shi D, Zhu L, Li J et al (2021) Binary label learning for semi-supervised feature selection. IEEE Trans Knowl Data Eng
Liu K, Li T, Yang X et al (2023) Semifree: semi-supervised feature selection with fuzzy relevance and redundancy. IEEE Trans Fuzzy Syst
Huang Z, Li J (2022) Feature subset selection with multi-scale fuzzy granulation. IEEE Transactions on Artif Intell 4(1):121–134
Article Google Scholar
Li S, Yang J, Wang G et al (2022) Granularity selection for hierarchical classification based on uncertainty measure. IEEE Trans Fuzzy Syst 30(11):4841–4855
Article Google Scholar
Skowron A, Rauszer C (1992) The discernibility matrices and functions in information systems. In: Intelligent decision support: handbook of applications and advances of the rough sets theory. Springer, pp 331–362
Ma F, Ding M, Zhang T et al (2019) Compressed binary discernibility matrix based incremental attribute reduction algorithm for group dynamic data. Neurocomputing 344:20–27
Article Google Scholar
Janostik R, Konecny J (2020) General framework for consistencies in decision contexts. Inf Sci 530:180–200
Article MathSciNet Google Scholar
Liu Y, Zheng L, **u Y et al (2020) Discernibility matrix based incremental feature selection on fused decision tables. International Journal of Approximate Reasoning 118:1–26
Article MathSciNet Google Scholar
Yang T, Zhong X, Lang G et al (2020) Granular matrix: A new approach for granular structure reduction and redundancy evaluation. IEEE Trans Fuzzy Syst 28(12):3133–3144
Article Google Scholar
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mobile Comput Commun Rev 5(1):3–55
Article MathSciNet Google Scholar
Sun Z, Zhang J, Dai L et al (2019) Mutual information based multi-label feature selection via constrained convex optimization. Neurocomputing 329:447–456
Article Google Scholar
Qian W, Long X, Wang Y et al (2020) Multi-label feature selection based on label distribution and feature complementarity. Appl Soft Comput 90:106167
Article Google Scholar
Yao E, Li D, Zhai Y et al (2021) Multilabel feature selection based on relative discernibility pair matrix. IEEE Trans Fuzzy Syst 30(7):2388–2401
Article Google Scholar
Peng J, Estrada G, Pedersoli M et al (2020) Deep co-training for semi-supervised image segmentation. Pattern Recognition 107:107269
Article Google Scholar
Liu N, Xu Z, Wu H et al (2021) Conversion-based aggregation algorithms for linear ordinal rankings combined with granular computing. Knowl-Based Syst 219:106880
Article Google Scholar
**ong C, Qian W, Wang Y et al (2021) Feature selection based on label distribution and fuzzy mutual information. Inf Sci 574:297–319
Sengupta D, Gupta P, Biswas A (2022) A survey on mutual information based medical image registration algorithms. Neurocomputing 486:174–188
Fang Y, Gao C, Yao Y (2020) Granularity-driven sequential three-way decisions: a cost-sensitive approach to classification. Inf Sci 507:644–664
Article Google Scholar
Sun L, Yin T, Ding W et al (2020) Multilabel feature selection using ml-relieff and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci 537:401–424
Article MathSciNet Google Scholar
Sheikhpour R, Berahmand K, Forouzandeh S (2023) Hessian-based semi-supervised feature selection using generalized uncorrelated constraint. Knowl-Based Syst 269:110521
Article Google Scholar
Chang X, Ma Z, Wei X et al (2020) Transductive semi-supervised metric learning for person re-identification. Pattern Recognition 108:107569
Article Google Scholar
Li H, Wang Y, Li Y et al (2021) Learning adaptive criteria weights for active semi-supervised learning. Inf Sci 561:286–303
Article MathSciNet Google Scholar
Guo Z, Shen Y, Yang T et al (2024) Semi-supervised feature selection based on fuzzy related family. Inf Sci 652:119660
Article Google Scholar
Sechidis K, Brown G (2018) Simple strategies for semi-supervised feature selection. Mach Learn 107(2):357–395
Article MathSciNet Google Scholar
Dai J, Hu Q, Zhang J et al (2016) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybernetics 47(9):2460–2471
Article Google Scholar
Song X, Zhang Y, Gong D et al (2021) Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recognition 112:107804
Lim H, Kim DW (2020) Mfc: Initialization method for multi-label feature selection based on conditional mutual information. Neurocomputing 382:40–51
Article Google Scholar
Pang Q, Zhang L (2021) A recursive feature retention method for semi-supervised feature selection. Int J Mach Learn Cybernetics 12(9):2639–2657
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No.62366019 and No. 61966016), the Natural Science Foundation of Jiangxi Province, China (No.20224BAB202020), and the National Key Research and Development Program of China (No.2022YFD1600202).

Author information

Authors and Affiliations

School of Software, Jiangxi Agricultural University, Nanchang, 330045, People’s Republic of China
Wenbin Qian & Lijuan Wan
School of Information Engineering, East China Jiaotong University, Nanchang, 330013, People’s Republic of China
Wenhao Shu

Authors

Wenbin Qian
View author publications
You can also search for this author in PubMed Google Scholar
Lijuan Wan
View author publications
You can also search for this author in PubMed Google Scholar
Wenhao Shu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Wenbin Qian: Conceptualization, Methodology, Formal analysis, Writing-original draft. Lijuan Wan: Data curation, Software, Visualization, Writing-original draft. Wenhao Shu: Validation, Writing - review & editing.

Corresponding author

Correspondence to Wenbin Qian.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Qian, W., Wan, L. & Shu, W. Semi-supervised feature selection based on discernibility matrix and mutual information. Appl Intell 54, 7278–7295 (2024). https://doi.org/10.1007/s10489-024-05481-3

Download citation

Accepted: 20 April 2024
Published: 03 June 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s10489-024-05481-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions