Abstract
Credit risk evaluation is a difficult task to predict default probabilities and deduce risk classification, and many classification methods and techniques have already been applied in predicting credit risk. In this paper, in view of the significant limitations of feature reduction and weak interpretability of the multi-criteria optimization classifier (MCOC), an improved LASSO-based MCOC (LASSO-MCOC) for simultaneous classification and feature selection is proposed and the corresponding algorithm is constructed. Based on the four real-world credit risk datasets, the LASSO-MCOC with linear and RBF kernels are tested and compared with the SMCOC proposed by Zhang et al. (2019) and six basic classification methods including logistic regression, multilayer perceptron, support vector machines, Naïve Bayes, k-nearest neighbors and random forest. The experimental and statistically comparative analysis results show that the LASSO-MCOC we proposed is more effective for credit risk assessment with better performance in accuracy, efficiency, and interpretability than that of other classifiers and can be extended to other real-world applications.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-023-10535-8/MediaObjects/10614_2023_10535_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-023-10535-8/MediaObjects/10614_2023_10535_Fig2_HTML.png)
Similar content being viewed by others
References
Arora, N., & Kaur, P. D. (2019). A bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment. Applied Soft Computing, 86, 105936.
Bhattacharya, A., Biswas, S. K., & Mandal, A. (2022). Credit risk evaluation: A comprehensive study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-022-13952-3
Bijak, K., & Thomas, L. C. (2012). Does segmentation always improve model performance in credit scoring? Expert Systems with Applications, 39(3), 2433–2442.
Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446–3453.
Bussmann, N., Giudici, P., Marinelli, D., & Papenbrock, J. (2021). Explainable machine learning in credit risk management. Computational Economics, 57, 203–216.
Chen, N., Ribeiro, B., & Chen, A. (2016). Financial credit risk assessment: A recent review. Artificial Intelligence Review, 45, 1–23.
Danenas, P., Garsva, G., & Gudas, S. (2011). Credit risk evaluation model development using support vector based classifiers. Procedia Computer Science, 4, 1699–1707.
Dastile, X., & Celik, T. (2021). Making deep learning-based predictions for credit scoring explainable. IEEE Access, 9, 50426–50440.
Fan, Y., Huang, H., & Yang, Z. (2022). Research on personal credit evaluation based on feature engineering and tree enhanced Bayesian Network. Journal of Gulin University of Aerospace Technology, 27(4), 573–579.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.
Galindo, J., & Tamayo, P. (2000). Credit risk assessment using statistical and machine learning: Basic methodology and risk modeling applications. Computational Economics, 15(1/2), 107–143.
Hand, D. J., & Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: A review. Journal of the Royal Statistical Society Series A (statistics in Society), 160(3), 523–541.
Hand, D. J., & Vinciotti, V. (2013). Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognition Letters, 24, 1555–1562.
Hofmann, H. (1994). Statlog (German Credit Data). UCI Machine Learning Repository. https://doi.org/10.24432/C5NC77
Huang, X. B., Liu, X. L., & Ren, Y. Q. (2018). Enterprise credit risk evaluation based on neural network algorithm. Cognitive Systems Research, 52, 317–324.
Huang, Y., Song, Y., & Wang, B. (2023). Improved forest optimization feature selection algorithm for credit evaluation. Computer Science, 50(S1), 531–536.
Islam, M. J., Wu, Q. M. J., Ahmadi, M., & Sid-Ahmed, M. A. (2010). Investigating the performance of Naïve-Bayes classifiers and K-nearest neighbor classifiers. Journal of Convergence Information Technology, 5(2), 133–137.
Kou, G. (2006). Multi-class multi-criteria mathematical programming and its applications in large scale data mining problems. Ph.D. Dissertation, University of Nebraska Omaha.
Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136.
Leong, C. K. (2016). Credit risk scoring with Bayesian network models. Computational Economics, 47(3), 423–446.
Louzada, F., Ara, A., & Fernandes, G. B. (2016). Classification methods applied to credit scoring: A systematic review and overall comparison. Surveys in Operations Research and Management Science, 21(2), 117–134.
Pavlenko, T., & Chernyak, O. (2010). Credit risk modeling using Bayesian networks. International Journal of Intelligent Systems, 25(4), 326–344.
Peng, Y., Kou, G., Shi, Y., & Chen, Z. (2008). A multi-criteria convex quadratic programming model for credit data analysis. Decision Support System, 44, 1016–1030.
Pérez-Martín, A., Pérez-Torregrosa, A., Rabasa, A., & Vaca, M. (2020). Feature selection to optimize credit banking risk evaluation decisions for the example of home equity loans. Mathematics, 8(11), 1971.
Quinlan, J. R. (1987). Statlog (Australian Credit Approval). UCI Machine Learning Repository. https://doi.org/10.24432/C59012
Robert, T. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, 58(1), 267–288.
Roy, A. G., & Urolagin, S. (2019). Credit risk assessment using decision tree and support vector machine based data analytic. In M. Mateev & P. Poutziouris (Eds.), Creative business and social innovations for a sustainable future (pp. 79–84). Cham: Springer Nature Switzerland AG.
Shi, Y. (2010). Multiple criteria optimization-based data mining methods and applications: A systematic survey. Knowledge and Information Systems, 24(3), 369–391.
Shi, Y., Peng, Y., Xu, W., & Tang, X. (2002). Datamining via multiple criteria linear programming: Applications in credit card portfolio management. International Journal of Information Technology & Decision Making, 1, 131–151.
Sohn, S. Y., Kim, D. H., & Yoon, J. H. (2016). Technology credit scoring model with fuzzy logistic regression. Applied Soft Computing, 43, 150–158.
Twala, B. (2010). Multiple classifier application to credit risk assessment. Expert Systems with Applications, 37(4), 3326–3336.
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2010). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society, 67, 91–108.
Trivedi, S. K. (2020). A study on credit scoring modeling with different feature selection and machine learning approaches. Technology in Society, 63, 101413.
Varetto, F. (1998). Genetic algorithms applications in the analysis of insolvency risk. Journal of Banking and Finance, 22, 1421–1439.
Wei, L.W. (2008). Research on data mining classification model based on the multiple criteria programming and its application. Ph.D. Dissertation, Institute of Policy and Management, Chinese Academy of Sciences.
West, D. (2000). Neural network credit scoring models. Computers and Operations Research, 27(11/12), 1131–1152.
Witten, I. H., & Frank, E. (2011). Data mining: Practical machine learning tools and techniques. Acm Sigmod Record, 31(1), 76–77.
Wu, Y., Li, X., Liu, Q., & Tong, G. (2022). The analysis of credit risks in agricultural supply chain finance assessment model based on genetic algorithm and backpropagation neural network. Computational Economics, 60, 1269–1292.
Rao, C., Liu, M., Goh, M., & Wen, J. (2020). 2-stage modified random forest model for credit risk assessment of P2P network lending to “Three Rurals” borrowers. Applied Soft Computing, 95, 106570.
Zhang, D., Zhou, X., Leung, S. C. H., & Zheng, J. (2010). Vertical bagging decision trees model for credit scoring. Expert Systems with Applications, 37(12), 7838–7843.
Zhang, H., Shi, Y., Yang, X., & Zhou, R. (2021). A firefly algorithm modified support vector machine for the credit risk assessment of supply chain finance. Research in International Business and Finance, 58, 101482.
Zhang, L., Hu, H., & Zhang, D. (2015a). A credit risk assessment model based on SVM for small and medium enterprises in supply chain finance. Financial Innovation, 1(1), 1–21.
Zhang, Z., Gao, G., & Shi, Y. (2014). Credit risk evaluation using multicriteria optimization classifier with kernel, fuzzification and penalty factors. European Journal of Operational Research, 237(1), 335–348.
Zhang, Z., Gao, G., & Tian, Y. (2015b). Multi-kernel multi-criteria optimization classifier with fuzzification and penalty factors for predicting biological activity. Knowledge-Based Systems, 89, 301–313.
Zhang, Z., He, J., Gao, G., & Tian, Y. (2019). Sparse multi-criteria optimization classifier for credit risk evaluation. Soft Computing, 23, 3053–3066.
Zhang, Z., He, J., Cao, J., Li, S., Li, X., Zhang, K., & Wang, P. (2022). An explainable multi-sparsity multi-kernel nonconvex optimization least-squares classifier method via ADMM. Neural Computing & Application, 34, 16103–16128.
Zhang, Z., He, J., Zheng, H., Cao, J., Wang, G., & Shi, Y. (2023). Alternating minimization-based sparse least-squares classifier for accuracy and interpretability improvement of credit risk assessment. International Journal of Information Technology & Decision Making, 20(1), 537–567.
Zhang, Z., Shi, Y., & Gao, G. (2009). A rough set-based multiple criteria linear programming approach for the medical diagnosis and prognosis. Expert Systems with Applications, 36(5), 8932–8937.
Zhao, J., & Li, B. (2022). Credit risk assessment of small and medium-sized enterprises in supply chain finance based on SVM and BP neural network. Neural Computing and Applications, 34(15), 12467–12478.
Zhao, X., Shi, Y., & Niu, L. (2015). Kernel based simple regularized multiple criteria linear program for binary classification and regression. Intelligent Data Analysis, 19(3), 505–527.
Funding
This study has been partially supported by the National Natural Science Foundation of China under Grant 61877061, and in part by the Major Program of Natural Science Foundation of the Higher Education Institutions of Jiangsu Province under Grant 22KJA520003 and Yantai School Land Integration Development Project under Grant 2021PT02.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study scheme and design. Algorithm design and optimization were performed by ZZ. Data collection, experiment and analysis were performed by XL, LL and HP. The first draft of the manuscript was written by XL and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships which could have appeared to influence the work reported in this paper.
Financial interests
All authors once participated in the development of bank management projects. Zhiwang Zhang began to research the credit risk evaluation earlier.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, X., Zhang, Z., Li, L. et al. Combining Feature Selection and Classification Using LASSO-Based MCO Classifier for Credit Risk Evaluation. Comput Econ (2024). https://doi.org/10.1007/s10614-023-10535-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s10614-023-10535-8