Incremental Machine Learning-Based Approach for Credit Scoring in the Age of Big Data

  • Conference paper
  • First Online:
Towards Digitally Transforming Accounting and Business Processes (ICAB 2023)

Part of the book series: Springer Proceedings in Business and Economics ((SPBE))

Included in the following conference series:

  • 303 Accesses

Abstract

The determination of the financial credibility of a loan applicant by financial institutions is quantified using a credit score. Sources of credit, such as banks and financial institutions, play a crucial role in sustaining economies and kee** cash flowing in the market. Financial institutions solve the problem of lack of data in credit scoring by extracting customer information from data sources such as social networks. Such data sources store data in large quantities. Traditional data mining techniques fail to accurately distinguish between a credit-worthy applicant and a non-creditworthy applicant using big data. The problem of big data has necessitated the advent of machine learning algorithms capable of sifting through large volumes of credit data sourced from social networks. Recently, to automate, streamline and digitise business processes such as credit scoring, machine learning approaches have been widely used, but the design and deployment of effective and robust credit scoring models require a lot of time, and if the behaviour of customers changes or the customer variables drift over time, the credit score model becomes obsolete or outdated. As a result, credit scoring tasks should be considered as an ephemeral scenario due to big data, as variables tend to drift over time. Incremental and adaptive credit scoring models can help to mitigate the loss of time of re-creating credit models due to drifting variables, big data challenges and changes in customer behaviour. This necessitates the design of robust and effective credit score models capable of learning incrementally, adaptive and able to detect changes. This paper proposes the Incremental Adaptive and Heterogeneous ensemble (IAHE) credit scoring model capable of learning incrementally, adapt to drifting variables and detect changes in customer behaviour and learn big data in a streaming fashion. Empirical experiments conducted indicate that IAHE has the strongest ability to recognise default samples and demonstrated the best generalisation ability on the datasets and the same time maintained a strong interpretability of the results when compared to nine credit scoring models on four public datasets. The superior generalisation performance of IAHE is statistically significant and demonstrated excellent robustness and adaptation to drifting variables.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Abellan, J., & Castellano J. G. (2017). A comparative study on base classifiers in ensemble methods for credit scoring. Expert Systems with Applications, 73, 1–10. https://doi.org/10.1016/j.eswa.2016.12.020

  • Barddal, J. P., Loezer, L., Enembreck, F., & Lanzuolo, R. (2020). Lessons learned from data stream classification applied to credit scoring. Expert Systems with Applications, 162, 113899.

    Article  Google Scholar 

  • Biallas, M., & O’Neil, F. (2020). Artificial Intelligence innovation in financial services. www.ifc.org/thoughtleadership

  • Blochlinger, A., & Leippold, M. (2006). Economic benefit of powerful credit scoring. Journal of Banking and Finance, 30, 851–873.

    Article  Google Scholar 

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 16(2002), 321–357.

    Article  Google Scholar 

  • Chen, T., & Guestrin, C. (2016). A scalable tree boosting system. In proceedings of the 22nd ACM SIGKDD International Conference on knowledge discovery and data mining, 785–794. Publishing.

    Google Scholar 

  • Chen, X., Li, S., Xu, X., Meng, F., & Cao, W. (2023). A novel GSCI-based ensemble approach for credit scoring. IEEE Access, 8, 222449–222465. https://doi.org/10.1109/ACCESS.2020.3043937

  • Crook, J. N., Edelman, D. B., & Thomas, L. C. (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research, 183, 1447–1465.

    Article  Google Scholar 

  • Cruz, R. M., Sabourin, R., & Cavalcanti, R. D. (2017). META-DES: Oracle: Meta-learning and feature selection for dynamic ensemble selection. Information Fusion, 38, 84–103.

    Article  Google Scholar 

  • Demsar, J. (2006). Statistical comparison of classifiers over multiple datasets. Journal of Machine Learning Research, 7(1–30), 2006.

    Google Scholar 

  • Engelbrecht, A.P., (2002). Computational Intelligence: An Introduction. John Wiley and Sons, Chichester, December, 2002.

    Google Scholar 

  • Fan, H., Liu, W., **a, M. (2022). Credit scoring based on tree-enhanced gradient boosting decision trees. Expert Systems with Applications, 189, 116034.

    Google Scholar 

  • Frame, W. S., Srinivasan, A., & Woosley, L. (2001). The effect of credit scoring on small business lending. Journal of Money, Credit and Banking, 33(3), 813–825.

    Article  Google Scholar 

  • Gicic, A., Donko, D., & Subasi, A. (2023). Intelligent credit scoring using deep learning methods. Concurrency and computation. Practice and Experience, 35(9).

    Google Scholar 

  • Gorzalczany, M., & Rudzinski, B. (2016). A multiobjective genetic optimisation for fast, fuzzy rule-based credit classification with balanced accuracy and interpretability. Applied Soft Computing, 40, 206–220. https://doi.org/10.1016/j.asoc.2015.11.037

    Article  Google Scholar 

  • Hand, D. J., & Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: A Review. Journal of the Royal Statistical Society: Series A (Statistics in Society), 160, 523–554.

    Google Scholar 

  • He, H., Zhang, W., & Zhang, S. (2018). A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Systems with Applications, 98, 105–117. https://doi.org/10.1016/j.eswa.2018.01.012

    Article  Google Scholar 

  • Hjelkrem, L. O., & Lange, P. E. (2023). Explaining deep learning models for credit scoring with SHAP: A case study using Open Banking Data. Journal of Risk and Financial Management, 16(4), 221. https://doi.org/10.3390/jrfm16040221

  • Hou, W., Kang-Wang, X., Wang, H. Z., & Li, L. (2020). A novel dynamic ensemble selection classifier for an imbalanced data set: An application for credit risk assessment. Knowledge Based Systems, 208, 106462. https://doi.org/10.1016/j.knosys.2020.106462

    Article  Google Scholar 

  • Kennedy, J., & Eberhart, R. C., (1995). Particle Swarm Optimization. In Proceedings of the IEEE International Conference on Neural Networks, Perth Australia, 4, 1942–1948

    Google Scholar 

  • Kyeong, S., & Shin, J. (2022). Two-stage credit scoring using Bayesian approach. Journal of Big Data, 9, 106. https://doi.org/10.1186/s40537-022-00665-5

    Article  Google Scholar 

  • Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. C. (2015). Benchmarking state of the art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136.

    Article  Google Scholar 

  • Liu, W., Fan, H., & **a, M. (2022a). Credit scoring based on tree-enhanced gradient boosting decision trees. Expert Systems with Applications, 189, 116034. https://doi.org/10.1016/j.eswa.2021.116034

    Article  Google Scholar 

  • Liu, W., Fan, H., & **a, M. (2022b). Tree-based heterogeneous cascade ensemble for credit scoring. International Journal of Forecasting. https://doi.org/10.1016/j.ijforecast.2022.07.007

  • Mushava, J., & Murray, M. (2018). An experimental comparison of classification techniques in debt recoveries scoring: Evidence from South Africa's unsecured lending market. Expert Systems with Applications, 111(2018), 35–50.

    Google Scholar 

  • Mushava, J., & Murray, M. (2022). A novel XGBoost extension for credit scoring class-imbalanced data combining a generalised extreme value link and a modified focal loss function. Expert Systems with Applications, 202. https://doi.org/10.1016/j.eswa.2022.117233

  • Niu, B., Ren, J., & Li, X. (2019). Credit scoring using machine learning machine learning by combing social network information: Evidence from peer to peer lending information, 2019(10), 397. https://doi.org/10.3390/info10120397

    Article  Google Scholar 

  • Qin, C., Zhang, Y., Bao, F., Zhang, C., Liu, P., & Liu, P. (2021). XGBoost optimised by adaptive particle swarm optimization for credit scoring. Mathematical Problems in Engineering, 2021. https://doi.org/10.1155/2021/6655510

  • Ranchi, Z., Liguo, X., & Qin, W. (2023). An ensemble credit scoring model based on Logistic regression with heterogeneous balancing and weighting effects. Expert Systems with Applications, 212. https://doi.org/10.1016/j.eswa.2022.118732

  • Shen, F., Zhao, X., Kou, G., & Alsaadi, F. E. (2021). A new deep learning ensemble credit risk evaluation model with an improved synthetic minority oversampling technique. Applied Soft Computing, 98(1), 106852. https://doi.org/10.1016/jasoc.2020.106852

    Article  Google Scholar 

  • Tang, T. (2009). Information asymmetry and firms’ credit market access: Evidence from Moody’s credit rating format refinement. Journal of Financial Economics, 93, 325–351.

    Article  Google Scholar 

  • Tsiu, C.-F., & Yen, D. C. (2014). A comparative study of classifier ensembles for bankruptcy prediction. Applied Soft Computing, 24, 977–984. https://doi.org/10.1016/j.asoc.2014.08.047

    Article  Google Scholar 

  • Wang, S. X., Dong, P. F., & Tian, Y. J. (2017). A novel method of statistical line loss estimation for distribution feeders based on feeder clusters and modified XGBoost. Energies, (10) (12) 2067.

    Google Scholar 

  • **a, Y., Liu, C., Da, B., & **e, F. (2018). A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Systems with Applications, 93. https://doi.org/10.1016/j.eswa.2017.10.022

  • **a, Y., Zhao, Z., He, L., Li, Y., & Niu, M. (2020). A novel tree-based dynamic heterogeneous ensemble method for credit scoring. Expert Systems with Applications, 159. https://doi.org/10.1016/j.eswa.2020.113615

  • **ao, H., **ao, Z., & Wang, Z. (2016). Ensemble classification based on supervised clustering for credit scoring. Applied Soft Computing, 43, 73–86. https://doi.org/10.1016/j.asoc.2016.02.022

    Article  Google Scholar 

  • Xu, X., Chen, X., Li, S., Meng, F., & Cao, W. (2023). A novel GSCI-Based Ensemble Approach for credit scoring: IEEE ACCESS, 8, 222449–222465. https://doi.org/10.1109/ACCESS.2020.3043937

  • Yang, L. (2011). Classifier selection for ensembles learning based on accuracy and diversity. Procedia Engineering, 15, 4266–4270.

    Article  Google Scholar 

  • Yao, J., Wang, Y., Wang, L., Liu, M., Jiang, H., & Chen, Y. (2022). Novel hybrid ensemble credit scoring model with stacking-based noise detection and weight assignment. Expert Systems with Applications, 198. https://doi.org/10.1016/j.eswa.2022.116913

  • Yule, G. (1900). On the association of attributes in statistics. Philosophical Transactions. Royal Society of London. Series A, 194, 257–319, 1900.

    Google Scholar 

  • Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms. CRC Press.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tinofirei Museba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Museba, T. (2024). Incremental Machine Learning-Based Approach for Credit Scoring in the Age of Big Data. In: Moloi, T., George, B. (eds) Towards Digitally Transforming Accounting and Business Processes. ICAB 2023. Springer Proceedings in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-031-46177-4_29

Download citation

Publish with us

Policies and ethics

Navigation