Towards exploiting linear regression for multi-class/multi-label classification: an empirical analysis

Jia, Bin-Bin; Liu, Jun-Ying; Zhang, Min-Ling

doi:10.1007/s13042-024-02114-6

Towards exploiting linear regression for multi-class/multi-label classification: an empirical analysis

Original Article
Published: 18 March 2024

(2024)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

201 Accesses
Explore all metrics

Abstract

Regression and classification are the two main learning tasks in supervised learning, and both of them can be solved by learning a hyperplane from training samples. However, the hyperplane in regression task aims at approximating the labels of samples as much as possible, while the hyperplane in classification task aims at separating the samples belonging to different classes as much as possible. From this perspective, regression and classification are two completely different learning tasks. However, linear regression is often used to solve multi-class/multi-label classification problems, which can be decomposed into a set of binary classification problems. In this paper, we focus on analyzing the issues of regression models in classification tasks. Firstly, when \(\{-1, +1\}\) is used to denote negative and positive class, we derive that it is essentially equivalent to optimizing square loss as the surrogate loss function of zero-one loss to solve binary classification problem via learning linear regression model. Then, we also derive what will happen to the model when \(\{-1, +1\}\) is replaced with \(\{0, 1\}\) for three different versions of linear regression. Finally, extensive experiments are conducted over multi-label/multi-class classification tasks and corresponding discussions are further conducted according to the experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PPML: Penalized Partial Least Squares Discriminant Analysis for Multi-Label Learning

Retargeted Regression Methods for Multi-label Learning

Co-learning Binary Classifiers for LP-Based Multi-label Classification

Data availability statement

The data sets used in Sect. 6 are publicly available at: https://mulan.sourceforge.net/datasets-mlc.html The data sets used in Sect. 7 are publicly available at: https://www.csie.ntu.edu.tw/\,cjlin/libsvmtools/datasets/multiclass.html.

Notes

Multi-class/multi-label classification tasks can be solved by decomposing them into a set of binary classification problems [10,11,12], so we only focus on the basic binary classification task here.
According to the different problem settings, for each instance, there is only one relevant label in multi-class classification while there can be multiple relevant labels in multi-label classification. Here, we use \(+1\) and \(-1\) to represent relevant and irrelevant labels, and we can also use 1 and 0 instead.
For example, \({\mathcal{R}}({{\textbf{W}}}) = \left\| {{\textbf{W}}} \right\| _F^2\) and \({\mathcal{R}}({{\textbf{W}}}) = \left\| {{\textbf{W}}} \right\| _1\) aim at obtaining balanced weights and sparse weights, respectively.
The threshold should be set to 0.5 if we use 1 and 0 to represent relevant and irrelevant labels.
Strictly speaking, linear regression [5] does not include the second term \(\left\| {{{\varvec{w}}}} \right\| _2^2\) while the regularized version in formulation (10) is usually termed as ridge regression [44, 45].
Generally speaking, \(\ell _2\)-regularization \(\left\| {{{\varvec{w}}}} \right\| _2^2\) aims at obtaining balanced model parameters to avoid overfitting to some features. For example, if the j-th item \(w_j\) of \({{\varvec{w}}}\) is very large compared to the other items, then a relatively small variation in the j-th feature will lead to large difference in modeling outputs. However, the bias term b functions equally to all instances and thus it is unnecessary to regularize the bias term b.
In fact, the label set \(\{0, 1\}\) can be regarded as the ground-truth posterior probability of any instance \({{\varvec{x}}}_i\) belonging to positive class, i.e., \(p(+1 \mid {{{\varvec{x}}}}_i) = 1\) holds for positive samples while \(p(+1 \mid {{{\varvec{x}}}}_i) = 0\) holds for negative samples. Thus, we use \(p_i\) to denote \({{\varvec{x}}}_i\)’s label if it takes the value in \(\{0,1\}\) in this section.
In experiments, the linear equation \(\textbf{A}{{{\varvec{x}}}} = {{\varvec{b}}}\) is solved via the command “\({{{\varvec{x}}}} = \textbf{A} \setminus {{\varvec{b}}}\)” which is more recommended than “\({{{\varvec{x}}}} = \text{pinv}(\textbf{A}) * {{\varvec{b}}}\)” in Matlab.
https://mulan.sourceforge.net/datasets-mlc.html.
https://www.csie.ntu.edu.tw/\,cjlin/libsvmtools/datasets/multiclass.html.

References

Zhou Z-H (2021) Machine learning. Springer, Singapore. https://doi.org/10.1007/978-981-15-1967-3
Book Google Scholar
Han J, Pei J, Tong H (2022) Data mining: concepts and techniques, 4th edn. Morgan Kaufmann, Cambridge
Google Scholar
Bzdok D, Krzywinski M, Altman N (2018) Machine learning: supervised methods. Nat Methods 15:5–6. https://doi.org/10.1038/nmeth.4551
Article CAS PubMed PubMed Central Google Scholar
Verdhan V (2020) Supervised learning with Python. Apress, Berkeley. https://doi.org/10.1007/978-1-4842-6156-9
Book Google Scholar
Bingham NH, Fry JM (2010) Regression: linear models in statistics. Springer, London. https://doi.org/10.1007/978-1-84882-969-5
Book Google Scholar
Drummond C (2017) Classification. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Boston, pp 205–208. https://doi.org/10.1007/978-1-4899-7687-1_111
Chapter Google Scholar
Zhang M-L, Zhou Z-H (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837. https://doi.org/10.1109/TKDE.2013.39
Article Google Scholar
Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Comput Surv 47(3):52. https://doi.org/10.1145/2716262
Article Google Scholar
Liu W, Wang H, Shen X, Tsang IW (2022) The emerging trends of multi-label learning. IEEE Trans Pattern Anal Mach Intell 44(11):7955–7974. https://doi.org/10.1109/TPAMI.2021.3119334
Article PubMed Google Scholar
Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286. https://doi.org/10.1613/jair.105
Article Google Scholar
Jia B-B, Liu J-Y, Hang J-Y, Zhang M-L (2023) Learning label-specific features for decomposition-based multi-class classification. Front Comput Sci 17(6):176348. https://doi.org/10.1007/s11704-023-3076-y
Article Google Scholar
Zhang M-L, Li Y-K, Liu X-Y, Geng X (2018) Binary relevance for multi-label learning: an overview. Front Comput Sci 12(2):191–202. https://doi.org/10.1007/s11704-017-7031-7
Article Google Scholar
Aggarwal CC (2018) Linear classification and regression for text. In: Machine learning for text. Springer, Cham, pp 159–208. https://doi.org/10.1007/978-3-319-73531-3_6
Xue H, Chen S, Yang Q (2009) Discriminatively regularized least-squares classification. Pattern Recognit 42(1):93–104. https://doi.org/10.1016/j.patcog.2008.07.010
Article ADS Google Scholar
**ang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754. https://doi.org/10.1109/TNNLS.2012.2212721
Article PubMed Google Scholar
Zhang X-Y, Wang L, **ang S, Liu C-L (2015) Retargeted least squares regression algorithm. IEEE Trans Neural Netw Learn Syst 26(9):2206–2213. https://doi.org/10.1109/TNNLS.2014.2371492
Article MathSciNet PubMed Google Scholar
Liu M, Zhang D, Chen S, Xue H (2016) Joint binary classifier learning for ECOC-based multi-class classification. IEEE Trans Pattern Anal Mach Intell 38(11):2335–2341. https://doi.org/10.1109/TPAMI.2015.2430325
Article Google Scholar
Ma Z, Chen S (2018) Multi-dimensional classification via a metric approach. Neurocomputing 275:1121–1131. https://doi.org/10.1016/j.neucom.2017.09.057
Article Google Scholar
Yang C, Wang W, Feng X, He R (2020) Group discriminative least square regression for multicategory classification. Neurocomputing 407:175–184. https://doi.org/10.1016/j.neucom.2020.05.016
Article Google Scholar
Zhan S, Wu J, Han N, Wen J, Fang X (2020) Group low-rank representation-based discriminant linear regression. IEEE Trans Circuits Syst Video Technol 30(3):760–770. https://doi.org/10.1109/TCSVT.2019.2897072
Article Google Scholar
Huang J, Li G, Huang Q, Wu X (2016) Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans Knowl Data Eng 28(12):3309–3323. https://doi.org/10.1109/TKDE.2016.2608339
Article Google Scholar
Yu Z-B, Zhang M-L (2022) Multi-label classification with label-specific feature generation: a wrapped approach. IEEE Trans Pattern Anal Mach Intell 44(9):5199–5210. https://doi.org/10.1109/TPAMI.2021.3070215
Article PubMed Google Scholar
Zhou W-J, Yu Y, Zhang M-L (2017) Binary linear compression for multi-label classification. In: Proceedings of the 26th international joint conference on artificial intelligence. ijcai.org, Melbourne, Australia, pp 3546–3552. https://doi.org/10.24963/ijcai.2017/496
Jia B-B, Zhang M-L (2023) Multi-dimensional classification via decomposed label encoding. IEEE Trans Knowl Data Eng 35(2):1844–1856. https://doi.org/10.1109/TKDE.2021.3100436
Article Google Scholar
Naseem I, Togneri R, Bennamoun M (2010) Linear regression for face recognition. IEEE Trans Pattern Anal Mach Intell 32(11):2106–2112. https://doi.org/10.1109/TPAMI.2010.128
Article PubMed Google Scholar
Liu W, Tsang IW (2015) Large margin metric learning for multi-label prediction. In: Proceedings of the 29th AAAI conference on artificial intelligence. AAAI Press, Austin, pp 2800–2806. https://doi.org/10.1609/aaai.v29i1.9610
Lv J, Wu T, Peng C-L, Liu Y, Xu N, Geng X (2021) Compact learning for multi-label classification. Pattern Recognit 113:107833. https://doi.org/10.1016/j.patcog.2021.107833
Article Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, Singapore
Google Scholar
Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of machine learning, 2nd edn. MIT Press, Cambridge
Google Scholar
Zhang T (2004) Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Stat 32(5):56–85. https://doi.org/10.1214/aos/1079120130
Article MathSciNet CAS Google Scholar
Cai X, Ding CHQ, Nie F, Huang H (2013) On the equivalent of low-rank linear regressions and linear discriminant analysis based regressions. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Chicago, pp 1124–1132. https://doi.org/10.1145/2487575.2487701
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
MathSciNet Google Scholar
Wen J, Xu Y, Li Z, Ma Z, Xu Y (2018) Inter-class sparsity based discriminative least square regression. Neural Netw 102:36–47. https://doi.org/10.1016/j.neunet.2018.02.002
Article PubMed Google Scholar
Shao R, Xu N, Geng X (2018) Multi-label learning with label enhancement. In: Proceedings of the IEEE international conference on data mining. IEEE, Singapore, pp 437–446. https://doi.org/10.1109/ICDM.2018.00059
Tao A, Xu N, Geng X (2018) Labeling information enhancement for multi-label learning with low-rank subspace. In: Proceedings of the 15th Pacific rim international conference on artificial intelligence. Springer, Nan**g, pp 671–683. https://doi.org/10.1007/978-3-319-97304-3_51
Xu N, Liu Y-P, Geng X (2021) Label enhancement for label distribution learning. IEEE Trans Knowl Data Eng 33(4):1632–1643. https://doi.org/10.1109/TKDE.2019.2947040
Article Google Scholar
Hou P, Geng X, Zhang M-L (2016) Multi-label manifold learning. In: Proceedings of the 30th AAAI conference on artificial intelligence. AAAI Press, Phoenix, pp 1680–1686. https://doi.org/10.1609/aaai.v30i1.10258
Zhang M-L, Zhou B-B, Liu X-Y (2016) Partial label learning via feature−aware disambiguation. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, San Francisco, pp 1335–1344. https://doi.org/10.1145/2939672.2939788
Zhang Q-W, Zhong Y, Zhang M-L (2018) Feature−induced labeling information enrichment for multi-label learning. In: Proceedings of the 32nd AAAI conference on artificial intelligence. AAAI Press, New Orleans, pp 4446–4453. https://doi.org/10.1609/aaai.v32i1.11656
Lv J, Xu N, Zheng R, Geng X (2019) Weakly supervised multi-label learning via label enhancement. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence. ijcai.org, Macao, China, pp 3101–3107. https://doi.org/10.24963/ijcai.2019/430
Xu N, Lv J, Geng X (2019) Partial label learning via label enhancement. In: Proceedings of the 33rd AAAI conference on artificial intelligence. AAAI Press, Honolulu, pp 5557–5564. https://doi.org/10.1609/aaai.v33i01.33015557
Xu N, Liu Y-P, Geng X (2020) Partial multi-label learning with label distribution. In: Proceedings of the 34th AAAI conference on artificial intelligence. AAAI Press, New York, pp 6510–6517. https://doi.org/10.1609/aaai.v34i04.6124
Wang L, Pan C (2018) Groupwise retargeted least-squares regression. IEEE Trans Neural Netw Learn Syst 29(4):1352–1358. https://doi.org/10.1109/TNNLS.2017.2651169
Article MathSciNet PubMed Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Article Google Scholar
van Wieringen WN (2015) Lecture notes on ridge regression. ar**v preprint. ar**v:1509.09169
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/BF00994018
Article Google Scholar
Kleinbaum DG, Klein M (2010) Logistic regression: a self-learning text. Springer, New York. https://doi.org/10.1007/978-1-4419-1742-3
Book Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139. https://doi.org/10.1006/jcss.1997.1504
Article MathSciNet Google Scholar
Vapnik V, Chervonenkis A (1991) The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognit Image Anal 1(3):283–305
Google Scholar
Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
Google Scholar
Zhang M-L, Wu L (2015) LIFT: multi-label learning with label-specific features. IEEE Trans Pattern Anal Mach Intell 37(1):107–120. https://doi.org/10.1109/TPAMI.2014.2339815
Article CAS Google Scholar
Wu X-Z, Zhou Z-H (2017) A unified view of multi-label performance measures. In: Proceedings of the 34th international conference on machine learning. PMLR, Sydney, pp 3780–3788
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet Google Scholar
Lorena AC, Carvalho ACPLF, Gama J (2008) A review on the combination of binary classifiers in multiclass problems. Artif Intell Rev 30(1–4):19–37. https://doi.org/10.1007/s10462-009-9114-9
Article Google Scholar
Patro SGK, Sahu KK (2015) Normalization: a preprocessing stage. Int Adv Res J Sci Eng Technol 2(3):20–22. https://doi.org/10.17148/IARJSET.2015.2305
Article Google Scholar
Henderi H, Wahyuningsih T, Rahwanto E (2021) Comparison of min-max normalization and z-score normalization in the k-nearest neighbor (kNN) algorithm to test the accuracy of types of breast cancer. Int J Inform Inf Syst 4(1):13–20. https://doi.org/10.47738/IJIIS.V4I1.73
Article Google Scholar
Liu J-Y, Jia B-B (2020) Combining one−vs-one decomposition and instance−based learning for multi-class classification. IEEE Access 8:197499–197507. https://doi.org/10.1109/ACCESS.2020.3034448
Article Google Scholar

Download references

Acknowledgements

The authors wish to thank the associate editor and anonymous reviewers for their helpful comments and suggestions. This work was supported by the National Science Foundation of China (62306131, 62176055), the Fundamental Research Funds for the Central Universities, and the Red Willow Outstanding Youth Talent Support Program of Lanzhou University of Technology. We thank the Big Data Center of Southeast University for providing the facility support on the numerical calculations in this paper.

Author information

Authors and Affiliations

College of Electrical and Information Engineering, Lanzhou University of Technology, Lanzhou, 730050, Gansu, China
Bin-Bin Jia & Jun-Ying Liu
Key Laboratory of New Generation Artificial Intelligence Technology & Its Interdisciplinary Applications (Southeast University), Ministry of Education, Nan**g, China
Bin-Bin Jia
School of Computer Science and Engineering, Southeast University, Nan**g, 210096, Jiangsu, China
Min-Ling Zhang

Authors

Bin-Bin Jia
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Min-Ling Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun-Ying Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1: Detailed experimental results

In the appendix, Tables 18, 19, 20, 21, 22, 23 and 24 report the detailed experimental results on multi-label data sets and Tables 25 and 26 report the detailed experimental results on multi-class data sets, where the corresponding Wilcoxon signed-ranks test results have been reported and discussed in the paper. Here, we purposefully keep five decimal digits to compare the possibly existing tiny performance differences among different approaches. For convenience, the performance ranks of all compared approaches over each data set are shown in parentheses and the average ranks over all data sets are also shown in the last row of each table. Note that when RBF kernel is used, KerRegBias and KerNoBias will achieve identical performance because the extra bias does not affect the distance between any two instances, thus we show their experimental results together and denote them as KerNo/RegBias.

Table 18 Experimental results in terms of hamming loss (the smaller, the better)

Full size table

Table 19 Experimental results in terms of macro-F1 (the larger, the better)

Full size table

Table 20 Experimental results in terms of micro-F1 (the larger, the better)

Full size table

Table 21 Experimental results in terms of average precision (the larger, the better)

Full size table

Table 22 Experimental results in terms of ranking loss (the smaller, the better)

Full size table

Table 23 Experimental results in terms of one error (the smaller, the better)

Full size table

Table 24 Experimental results in terms of coverage (the smaller, the better)

Full size table

Table 25 Experimental results in terms of accuracy (the larger, the better)

Full size table

Table 26 Experimental results in terms of average−F1 (the larger, the better)

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jia, BB., Liu, JY. & Zhang, ML. Towards exploiting linear regression for multi-class/multi-label classification: an empirical analysis. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02114-6

Download citation

Received: 16 October 2022
Accepted: 27 January 2024
Published: 18 March 2024
DOI: https://doi.org/10.1007/s13042-024-02114-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards exploiting linear regression for multi-class/multi-label classification: an empirical analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

PPML: Penalized Partial Least Squares Discriminant Analysis for Multi-Label Learning

Retargeted Regression Methods for Multi-label Learning

Co-learning Binary Classifiers for LP-Based Multi-label Classification

Data availability statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix 1: Detailed experimental results

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Towards exploiting linear regression for multi-class/multi-label classification: an empirical analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

PPML: Penalized Partial Least Squares Discriminant Analysis for Multi-Label Learning

Retargeted Regression Methods for Multi-label Learning

Co-learning Binary Classifiers for LP-Based Multi-label Classification

Data availability statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix 1: Detailed experimental results

Appendix 1: Detailed experimental results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation