Log in

Combining Feature Selection and Classification Using LASSO-Based MCO Classifier for Credit Risk Evaluation

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

Credit risk evaluation is a difficult task to predict default probabilities and deduce risk classification, and many classification methods and techniques have already been applied in predicting credit risk. In this paper, in view of the significant limitations of feature reduction and weak interpretability of the multi-criteria optimization classifier (MCOC), an improved LASSO-based MCOC (LASSO-MCOC) for simultaneous classification and feature selection is proposed and the corresponding algorithm is constructed. Based on the four real-world credit risk datasets, the LASSO-MCOC with linear and RBF kernels are tested and compared with the SMCOC proposed by Zhang et al. (2019) and six basic classification methods including logistic regression, multilayer perceptron, support vector machines, Naïve Bayes, k-nearest neighbors and random forest. The experimental and statistically comparative analysis results show that the LASSO-MCOC we proposed is more effective for credit risk assessment with better performance in accuracy, efficiency, and interpretability than that of other classifiers and can be extended to other real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Arora, N., & Kaur, P. D. (2019). A bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment. Applied Soft Computing, 86, 105936.

    Article  Google Scholar 

  • Bhattacharya, A., Biswas, S. K., & Mandal, A. (2022). Credit risk evaluation: A comprehensive study. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-022-13952-3

    Article  Google Scholar 

  • Bijak, K., & Thomas, L. C. (2012). Does segmentation always improve model performance in credit scoring? Expert Systems with Applications, 39(3), 2433–2442.

    Article  Google Scholar 

  • Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446–3453.

    Article  Google Scholar 

  • Bussmann, N., Giudici, P., Marinelli, D., & Papenbrock, J. (2021). Explainable machine learning in credit risk management. Computational Economics, 57, 203–216.

    Article  Google Scholar 

  • Chen, N., Ribeiro, B., & Chen, A. (2016). Financial credit risk assessment: A recent review. Artificial Intelligence Review, 45, 1–23.

    Article  Google Scholar 

  • Danenas, P., Garsva, G., & Gudas, S. (2011). Credit risk evaluation model development using support vector based classifiers. Procedia Computer Science, 4, 1699–1707.

    Article  Google Scholar 

  • Dastile, X., & Celik, T. (2021). Making deep learning-based predictions for credit scoring explainable. IEEE Access, 9, 50426–50440.

    Article  Google Scholar 

  • Fan, Y., Huang, H., & Yang, Z. (2022). Research on personal credit evaluation based on feature engineering and tree enhanced Bayesian Network. Journal of Gulin University of Aerospace Technology, 27(4), 573–579.

    Google Scholar 

  • Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.

    Article  Google Scholar 

  • Galindo, J., & Tamayo, P. (2000). Credit risk assessment using statistical and machine learning: Basic methodology and risk modeling applications. Computational Economics, 15(1/2), 107–143.

    Article  Google Scholar 

  • Hand, D. J., & Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: A review. Journal of the Royal Statistical Society Series A (statistics in Society), 160(3), 523–541.

    Article  Google Scholar 

  • Hand, D. J., & Vinciotti, V. (2013). Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognition Letters, 24, 1555–1562.

    Article  Google Scholar 

  • Hofmann, H. (1994). Statlog (German Credit Data). UCI Machine Learning Repository. https://doi.org/10.24432/C5NC77

    Article  Google Scholar 

  • Huang, X. B., Liu, X. L., & Ren, Y. Q. (2018). Enterprise credit risk evaluation based on neural network algorithm. Cognitive Systems Research, 52, 317–324.

    Article  Google Scholar 

  • Huang, Y., Song, Y., & Wang, B. (2023). Improved forest optimization feature selection algorithm for credit evaluation. Computer Science, 50(S1), 531–536.

    Google Scholar 

  • Islam, M. J., Wu, Q. M. J., Ahmadi, M., & Sid-Ahmed, M. A. (2010). Investigating the performance of Naïve-Bayes classifiers and K-nearest neighbor classifiers. Journal of Convergence Information Technology, 5(2), 133–137.

    Article  Google Scholar 

  • Kou, G. (2006). Multi-class multi-criteria mathematical programming and its applications in large scale data mining problems. Ph.D. Dissertation, University of Nebraska Omaha.

  • Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136.

    Article  Google Scholar 

  • Leong, C. K. (2016). Credit risk scoring with Bayesian network models. Computational Economics, 47(3), 423–446.

    Article  Google Scholar 

  • Louzada, F., Ara, A., & Fernandes, G. B. (2016). Classification methods applied to credit scoring: A systematic review and overall comparison. Surveys in Operations Research and Management Science, 21(2), 117–134.

    Article  Google Scholar 

  • Pavlenko, T., & Chernyak, O. (2010). Credit risk modeling using Bayesian networks. International Journal of Intelligent Systems, 25(4), 326–344.

    Google Scholar 

  • Peng, Y., Kou, G., Shi, Y., & Chen, Z. (2008). A multi-criteria convex quadratic programming model for credit data analysis. Decision Support System, 44, 1016–1030.

    Article  Google Scholar 

  • Pérez-Martín, A., Pérez-Torregrosa, A., Rabasa, A., & Vaca, M. (2020). Feature selection to optimize credit banking risk evaluation decisions for the example of home equity loans. Mathematics, 8(11), 1971.

    Article  Google Scholar 

  • Quinlan, J. R. (1987). Statlog (Australian Credit Approval). UCI Machine Learning Repository. https://doi.org/10.24432/C59012

    Article  Google Scholar 

  • Robert, T. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, 58(1), 267–288.

    Google Scholar 

  • Roy, A. G., & Urolagin, S. (2019). Credit risk assessment using decision tree and support vector machine based data analytic. In M. Mateev & P. Poutziouris (Eds.), Creative business and social innovations for a sustainable future (pp. 79–84). Cham: Springer Nature Switzerland AG.

    Chapter  Google Scholar 

  • Shi, Y. (2010). Multiple criteria optimization-based data mining methods and applications: A systematic survey. Knowledge and Information Systems, 24(3), 369–391.

    Article  Google Scholar 

  • Shi, Y., Peng, Y., Xu, W., & Tang, X. (2002). Datamining via multiple criteria linear programming: Applications in credit card portfolio management. International Journal of Information Technology & Decision Making, 1, 131–151.

    Article  Google Scholar 

  • Sohn, S. Y., Kim, D. H., & Yoon, J. H. (2016). Technology credit scoring model with fuzzy logistic regression. Applied Soft Computing, 43, 150–158.

    Article  Google Scholar 

  • Twala, B. (2010). Multiple classifier application to credit risk assessment. Expert Systems with Applications, 37(4), 3326–3336.

    Article  Google Scholar 

  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2010). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society, 67, 91–108.

    Article  Google Scholar 

  • Trivedi, S. K. (2020). A study on credit scoring modeling with different feature selection and machine learning approaches. Technology in Society, 63, 101413.

    Article  Google Scholar 

  • Varetto, F. (1998). Genetic algorithms applications in the analysis of insolvency risk. Journal of Banking and Finance, 22, 1421–1439.

    Article  Google Scholar 

  • Wei, L.W. (2008). Research on data mining classification model based on the multiple criteria programming and its application. Ph.D. Dissertation, Institute of Policy and Management, Chinese Academy of Sciences.

  • West, D. (2000). Neural network credit scoring models. Computers and Operations Research, 27(11/12), 1131–1152.

    Article  Google Scholar 

  • Witten, I. H., & Frank, E. (2011). Data mining: Practical machine learning tools and techniques. Acm Sigmod Record, 31(1), 76–77.

    Article  Google Scholar 

  • Wu, Y., Li, X., Liu, Q., & Tong, G. (2022). The analysis of credit risks in agricultural supply chain finance assessment model based on genetic algorithm and backpropagation neural network. Computational Economics, 60, 1269–1292.

    Article  Google Scholar 

  • Rao, C., Liu, M., Goh, M., & Wen, J. (2020). 2-stage modified random forest model for credit risk assessment of P2P network lending to “Three Rurals” borrowers. Applied Soft Computing, 95, 106570.

    Article  Google Scholar 

  • Zhang, D., Zhou, X., Leung, S. C. H., & Zheng, J. (2010). Vertical bagging decision trees model for credit scoring. Expert Systems with Applications, 37(12), 7838–7843.

    Article  Google Scholar 

  • Zhang, H., Shi, Y., Yang, X., & Zhou, R. (2021). A firefly algorithm modified support vector machine for the credit risk assessment of supply chain finance. Research in International Business and Finance, 58, 101482.

    Article  Google Scholar 

  • Zhang, L., Hu, H., & Zhang, D. (2015a). A credit risk assessment model based on SVM for small and medium enterprises in supply chain finance. Financial Innovation, 1(1), 1–21.

    Article  Google Scholar 

  • Zhang, Z., Gao, G., & Shi, Y. (2014). Credit risk evaluation using multicriteria optimization classifier with kernel, fuzzification and penalty factors. European Journal of Operational Research, 237(1), 335–348.

    Article  Google Scholar 

  • Zhang, Z., Gao, G., & Tian, Y. (2015b). Multi-kernel multi-criteria optimization classifier with fuzzification and penalty factors for predicting biological activity. Knowledge-Based Systems, 89, 301–313.

    Article  Google Scholar 

  • Zhang, Z., He, J., Gao, G., & Tian, Y. (2019). Sparse multi-criteria optimization classifier for credit risk evaluation. Soft Computing, 23, 3053–3066.

    Article  Google Scholar 

  • Zhang, Z., He, J., Cao, J., Li, S., Li, X., Zhang, K., & Wang, P. (2022). An explainable multi-sparsity multi-kernel nonconvex optimization least-squares classifier method via ADMM. Neural Computing & Application, 34, 16103–16128.

    Article  Google Scholar 

  • Zhang, Z., He, J., Zheng, H., Cao, J., Wang, G., & Shi, Y. (2023). Alternating minimization-based sparse least-squares classifier for accuracy and interpretability improvement of credit risk assessment. International Journal of Information Technology & Decision Making, 20(1), 537–567.

    Article  Google Scholar 

  • Zhang, Z., Shi, Y., & Gao, G. (2009). A rough set-based multiple criteria linear programming approach for the medical diagnosis and prognosis. Expert Systems with Applications, 36(5), 8932–8937.

    Article  Google Scholar 

  • Zhao, J., & Li, B. (2022). Credit risk assessment of small and medium-sized enterprises in supply chain finance based on SVM and BP neural network. Neural Computing and Applications, 34(15), 12467–12478.

    Article  Google Scholar 

  • Zhao, X., Shi, Y., & Niu, L. (2015). Kernel based simple regularized multiple criteria linear program for binary classification and regression. Intelligent Data Analysis, 19(3), 505–527.

    Article  Google Scholar 

Download references

Funding

This study has been partially supported by the National Natural Science Foundation of China under Grant 61877061, and in part by the Major Program of Natural Science Foundation of the Higher Education Institutions of Jiangsu Province under Grant 22KJA520003 and Yantai School Land Integration Development Project under Grant 2021PT02.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study scheme and design. Algorithm design and optimization were performed by ZZ. Data collection, experiment and analysis were performed by XL, LL and HP. The first draft of the manuscript was written by XL and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhiwang Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships which could have appeared to influence the work reported in this paper.

Financial interests

All authors once participated in the development of bank management projects. Zhiwang Zhang began to research the credit risk evaluation earlier.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Zhang, Z., Li, L. et al. Combining Feature Selection and Classification Using LASSO-Based MCO Classifier for Credit Risk Evaluation. Comput Econ (2024). https://doi.org/10.1007/s10614-023-10535-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10614-023-10535-8

Keywords

Navigation