Abstract
The determination of the financial credibility of a loan applicant by financial institutions is quantified using a credit score. Sources of credit, such as banks and financial institutions, play a crucial role in sustaining economies and kee** cash flowing in the market. Financial institutions solve the problem of lack of data in credit scoring by extracting customer information from data sources such as social networks. Such data sources store data in large quantities. Traditional data mining techniques fail to accurately distinguish between a credit-worthy applicant and a non-creditworthy applicant using big data. The problem of big data has necessitated the advent of machine learning algorithms capable of sifting through large volumes of credit data sourced from social networks. Recently, to automate, streamline and digitise business processes such as credit scoring, machine learning approaches have been widely used, but the design and deployment of effective and robust credit scoring models require a lot of time, and if the behaviour of customers changes or the customer variables drift over time, the credit score model becomes obsolete or outdated. As a result, credit scoring tasks should be considered as an ephemeral scenario due to big data, as variables tend to drift over time. Incremental and adaptive credit scoring models can help to mitigate the loss of time of re-creating credit models due to drifting variables, big data challenges and changes in customer behaviour. This necessitates the design of robust and effective credit score models capable of learning incrementally, adaptive and able to detect changes. This paper proposes the Incremental Adaptive and Heterogeneous ensemble (IAHE) credit scoring model capable of learning incrementally, adapt to drifting variables and detect changes in customer behaviour and learn big data in a streaming fashion. Empirical experiments conducted indicate that IAHE has the strongest ability to recognise default samples and demonstrated the best generalisation ability on the datasets and the same time maintained a strong interpretability of the results when compared to nine credit scoring models on four public datasets. The superior generalisation performance of IAHE is statistically significant and demonstrated excellent robustness and adaptation to drifting variables.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abellan, J., & Castellano J. G. (2017). A comparative study on base classifiers in ensemble methods for credit scoring. Expert Systems with Applications, 73, 1–10. https://doi.org/10.1016/j.eswa.2016.12.020
Barddal, J. P., Loezer, L., Enembreck, F., & Lanzuolo, R. (2020). Lessons learned from data stream classification applied to credit scoring. Expert Systems with Applications, 162, 113899.
Biallas, M., & O’Neil, F. (2020). Artificial Intelligence innovation in financial services. www.ifc.org/thoughtleadership
Blochlinger, A., & Leippold, M. (2006). Economic benefit of powerful credit scoring. Journal of Banking and Finance, 30, 851–873.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 16(2002), 321–357.
Chen, T., & Guestrin, C. (2016). A scalable tree boosting system. In proceedings of the 22nd ACM SIGKDD International Conference on knowledge discovery and data mining, 785–794. Publishing.
Chen, X., Li, S., Xu, X., Meng, F., & Cao, W. (2023). A novel GSCI-based ensemble approach for credit scoring. IEEE Access, 8, 222449–222465. https://doi.org/10.1109/ACCESS.2020.3043937
Crook, J. N., Edelman, D. B., & Thomas, L. C. (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research, 183, 1447–1465.
Cruz, R. M., Sabourin, R., & Cavalcanti, R. D. (2017). META-DES: Oracle: Meta-learning and feature selection for dynamic ensemble selection. Information Fusion, 38, 84–103.
Demsar, J. (2006). Statistical comparison of classifiers over multiple datasets. Journal of Machine Learning Research, 7(1–30), 2006.
Engelbrecht, A.P., (2002). Computational Intelligence: An Introduction. John Wiley and Sons, Chichester, December, 2002.
Fan, H., Liu, W., **a, M. (2022). Credit scoring based on tree-enhanced gradient boosting decision trees. Expert Systems with Applications, 189, 116034.
Frame, W. S., Srinivasan, A., & Woosley, L. (2001). The effect of credit scoring on small business lending. Journal of Money, Credit and Banking, 33(3), 813–825.
Gicic, A., Donko, D., & Subasi, A. (2023). Intelligent credit scoring using deep learning methods. Concurrency and computation. Practice and Experience, 35(9).
Gorzalczany, M., & Rudzinski, B. (2016). A multiobjective genetic optimisation for fast, fuzzy rule-based credit classification with balanced accuracy and interpretability. Applied Soft Computing, 40, 206–220. https://doi.org/10.1016/j.asoc.2015.11.037
Hand, D. J., & Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: A Review. Journal of the Royal Statistical Society: Series A (Statistics in Society), 160, 523–554.
He, H., Zhang, W., & Zhang, S. (2018). A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Systems with Applications, 98, 105–117. https://doi.org/10.1016/j.eswa.2018.01.012
Hjelkrem, L. O., & Lange, P. E. (2023). Explaining deep learning models for credit scoring with SHAP: A case study using Open Banking Data. Journal of Risk and Financial Management, 16(4), 221. https://doi.org/10.3390/jrfm16040221
Hou, W., Kang-Wang, X., Wang, H. Z., & Li, L. (2020). A novel dynamic ensemble selection classifier for an imbalanced data set: An application for credit risk assessment. Knowledge Based Systems, 208, 106462. https://doi.org/10.1016/j.knosys.2020.106462
Kennedy, J., & Eberhart, R. C., (1995). Particle Swarm Optimization. In Proceedings of the IEEE International Conference on Neural Networks, Perth Australia, 4, 1942–1948
Kyeong, S., & Shin, J. (2022). Two-stage credit scoring using Bayesian approach. Journal of Big Data, 9, 106. https://doi.org/10.1186/s40537-022-00665-5
Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. C. (2015). Benchmarking state of the art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136.
Liu, W., Fan, H., & **a, M. (2022a). Credit scoring based on tree-enhanced gradient boosting decision trees. Expert Systems with Applications, 189, 116034. https://doi.org/10.1016/j.eswa.2021.116034
Liu, W., Fan, H., & **a, M. (2022b). Tree-based heterogeneous cascade ensemble for credit scoring. International Journal of Forecasting. https://doi.org/10.1016/j.ijforecast.2022.07.007
Mushava, J., & Murray, M. (2018). An experimental comparison of classification techniques in debt recoveries scoring: Evidence from South Africa's unsecured lending market. Expert Systems with Applications, 111(2018), 35–50.
Mushava, J., & Murray, M. (2022). A novel XGBoost extension for credit scoring class-imbalanced data combining a generalised extreme value link and a modified focal loss function. Expert Systems with Applications, 202. https://doi.org/10.1016/j.eswa.2022.117233
Niu, B., Ren, J., & Li, X. (2019). Credit scoring using machine learning machine learning by combing social network information: Evidence from peer to peer lending information, 2019(10), 397. https://doi.org/10.3390/info10120397
Qin, C., Zhang, Y., Bao, F., Zhang, C., Liu, P., & Liu, P. (2021). XGBoost optimised by adaptive particle swarm optimization for credit scoring. Mathematical Problems in Engineering, 2021. https://doi.org/10.1155/2021/6655510
Ranchi, Z., Liguo, X., & Qin, W. (2023). An ensemble credit scoring model based on Logistic regression with heterogeneous balancing and weighting effects. Expert Systems with Applications, 212. https://doi.org/10.1016/j.eswa.2022.118732
Shen, F., Zhao, X., Kou, G., & Alsaadi, F. E. (2021). A new deep learning ensemble credit risk evaluation model with an improved synthetic minority oversampling technique. Applied Soft Computing, 98(1), 106852. https://doi.org/10.1016/jasoc.2020.106852
Tang, T. (2009). Information asymmetry and firms’ credit market access: Evidence from Moody’s credit rating format refinement. Journal of Financial Economics, 93, 325–351.
Tsiu, C.-F., & Yen, D. C. (2014). A comparative study of classifier ensembles for bankruptcy prediction. Applied Soft Computing, 24, 977–984. https://doi.org/10.1016/j.asoc.2014.08.047
Wang, S. X., Dong, P. F., & Tian, Y. J. (2017). A novel method of statistical line loss estimation for distribution feeders based on feeder clusters and modified XGBoost. Energies, (10) (12) 2067.
**a, Y., Liu, C., Da, B., & **e, F. (2018). A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Systems with Applications, 93. https://doi.org/10.1016/j.eswa.2017.10.022
**a, Y., Zhao, Z., He, L., Li, Y., & Niu, M. (2020). A novel tree-based dynamic heterogeneous ensemble method for credit scoring. Expert Systems with Applications, 159. https://doi.org/10.1016/j.eswa.2020.113615
**ao, H., **ao, Z., & Wang, Z. (2016). Ensemble classification based on supervised clustering for credit scoring. Applied Soft Computing, 43, 73–86. https://doi.org/10.1016/j.asoc.2016.02.022
Xu, X., Chen, X., Li, S., Meng, F., & Cao, W. (2023). A novel GSCI-Based Ensemble Approach for credit scoring: IEEE ACCESS, 8, 222449–222465. https://doi.org/10.1109/ACCESS.2020.3043937
Yang, L. (2011). Classifier selection for ensembles learning based on accuracy and diversity. Procedia Engineering, 15, 4266–4270.
Yao, J., Wang, Y., Wang, L., Liu, M., Jiang, H., & Chen, Y. (2022). Novel hybrid ensemble credit scoring model with stacking-based noise detection and weight assignment. Expert Systems with Applications, 198. https://doi.org/10.1016/j.eswa.2022.116913
Yule, G. (1900). On the association of attributes in statistics. Philosophical Transactions. Royal Society of London. Series A, 194, 257–319, 1900.
Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms. CRC Press.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Museba, T. (2024). Incremental Machine Learning-Based Approach for Credit Scoring in the Age of Big Data. In: Moloi, T., George, B. (eds) Towards Digitally Transforming Accounting and Business Processes. ICAB 2023. Springer Proceedings in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-031-46177-4_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-46177-4_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46176-7
Online ISBN: 978-3-031-46177-4
eBook Packages: Business and ManagementBusiness and Management (R0)