Abstract
In this paper, we applied machine learning techniques to analyze the default probability in financial institutions using a large dataset of variables collected from 2325 banks over 17 years, extracting the most relevant variables using a feature selection method, predicting default and systemic risk, and finally investigating the contributions of each relevant feature to the overall financial stress of banking institutions using explainable artificial intelligence techniques. We found that the most important variables for the default risk prediction include the bailout probability, the market share in terms of assets, and the market-to-book ratio, highlighting the relevance of issues like moral hazard and market leverage. On the other hand, for systemic risk prediction, the number of banks in the country and interest rates level are among the most relevant features, indicating that markets with more competition fare better against systemic risks. The findings of this research provide an empirical assessment of the main factors that explain the presence of financial stress in banking institutions, conciliating the versatility of machine learning models with practical interpretability and causal inference, being of potential interest to researchers in quantitative finance and market practitioners.
Similar content being viewed by others
References
Acharya, V. V. (2009). A theory of systemic risk and design of prudential bank regulation. Journal of Financial Stability, 5(3), 224–255. https://doi.org/10.1016/j.jfs.2009.02.001
Affes, Z., & Hentati-Kaffel, R. (2019). Predicting US Banks bankruptcy: Logit versus canonical discriminant analysis. Computational Economics, 54(1), 199–244. https://doi.org/10.1007/s10614-017-9698-0
Akins, B., Li, L., Ng, J., & Rusticus, T. O. (2016). Bank competition and financial stability: Evidence from the financial crisis. Journal of Financial and Quantitative Analysis, 51(1), 1–28. https://doi.org/10.1017/S0022109016000090
Albuquerque, P. H. M., Peng, Y., & Silva, J. PFd. (2022). Making the whole greater than the sum of its parts: A literature review of ensemble methods for financial time series forecasting. Journal of Forecasting, 41(8), 1701–1724.
Amidu, M., & Wolfe, S. (2013). Does bank competition and diversification lead to greater stability? Evidence from emerging markets. Review of Development Finance, 3(3), 152–166. https://doi.org/10.1016/j.rdf.2013.08.002
Anginer, D., Demirguc-Kunt, A., & Zhu, M. (2014). How does competition affect bank systemic risk? Journal of Financial Intermediation, 23(1), 1–26. https://doi.org/10.1016/j.jfi.2013.11.001
Beck, T., De Jonghe, O., & Schepens, G. (2013). Bank competition and stability: Cross-country heterogeneity. Journal of Financial Intermediation. https://doi.org/10.1016/j.jfi.2012.07.001
Beutel, J., List, S., & von Schweinitz, G. (2019). Does machine learning help us predict banking crises? Journal of Financial Stability, 45(100), 693.
Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. The Journal of Political Economy, 81, 637–654.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Breiman, L. (2002). Manual on setting up, using, and understanding random forests v3. 1. Statistics Department University of California Berkeley, CA, USA (Vol. 1(58) pp. 3–42).
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (2017). Classification and regression trees. Routledge.
Burns, K., & Moosa, I. A. (2015). Enhancing the forecasting power of exchange rate models by introducing nonlinearity: Does it work? Economic Modelling, 50, 27–39. https://doi.org/10.1016/j.econmod.2015.06.003
Bussmann, N., Giudici, P., Marinelli, D., & Papenbrock, J. (2021). Explainable machine learning in credit risk management. Computational Economics, 57(1), 203–216.
Carmona, P., Climent, F., & Momparler, A. (2019). Predicting failure in the us banking sector: An extreme gradient boosting approach. International Review of Economics and Finance, 61, 304–323.
Carmona, P., Dwekat, A., & Mardawi, Z. (2022). No more black boxes! Explaining the predictions of a machine learning xgboost classifier algorithm in business failure. Research in International Business and Finance, 61(101), 649.
Carvalho, D., Ferreira, M. A., & Matos, P. (2015). Lending relationships and the effect of bank distress: Evidence from the 2007–2009 financial crisis. Journal of Financial and Quantitative Analysis, 50(6), 1165–1197.
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, association for computing machinery, New York, NY, USA, KDD ’16 (pp. 785-794). https://doi.org/10.1145/2939672.2939785
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Croxson, K., Bracke, P., & Jung, C. (2019). Explaining why the computer says ‘no’. FCA, 5, 31.
Dahiya, S., Saunders, A., & Srinivasan, A. (2003). Financial distress and bank lending relationships. The Journal of Finance, 58(1), 375–399.
Ekinci, A., & Erdal, H. I. (2017). Forecasting bank failure: Base learners, ensembles and hybrid ensembles. Computational Economics, 49(4), 677–686.
Fama, E. F., & French, K. R. (1992). The cross-section of expected stock returns. The Journal of Finance, 47(2), 427–465.
Fama, E. F., & French, K. R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116(1), 1–22. https://doi.org/10.1016/j.jfineco.2014.10.010
Feng, G., Polson, N.G., & Xu, J. (2018). Deep learning in characteristics-sorted factor models. ar**v preprint ar**v:1805.01104
Fisher, A., Rudin, C., & Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(177), 1–81.
Fu, X. M., Lin, Y. R., & Molyneux, P. (2014). Bank competition and financial stability in Asia Pacific. Journal of Banking and Finance, 38, 64–77. https://doi.org/10.1016/j.jbankfin.2013.09.012
Gan, L., Wang, H., & Yang, Z. (2020). Machine learning solutions to challenges in finance: An application to the pricing of financial products. Technological Forecasting and Social Change, 153(119), 928.
Genuer, R., & Poggi, J. M. (2020). Random forests. In Random forests with R (pp. 33–55). Springer.
Géron, A. (2019). Hands-on machine learning with Scikit-learn, Keras, and TensorFlow (2nd ed.). O’Reilly Media.
Gropp, R., Hakenes, H., & Schnabel, I. (2011). Competition, risk-shifting, and public bail-out policies. Review of Financial Studies, 24(6), 2084–2120. https://doi.org/10.1093/rfs/hhq114
Gruszczyński, M. (2020). Modeling financial distress and bankruptcy. In Financial Microeconometrics (pp. 77–119). Springer.
Gu, S., Kelly, B., & **u, D. (2020). Empirical asset pricing via machine learning. The Review of Financial Studies, 33(5), 2223–2273.
Hasan, I., & Marton, K. (2003). Development and efficiency of the banking sector in a transitional economy: Hungarian experience. Journal of Banking and Finance, 27(12), 2249–2271. https://doi.org/10.1016/S0378-4266(02)00328-X
Hassani, H., Huang, X., & Silva, E. (2018). Digitalisation and big data mining in banking. Big Data and Cognitive Computing, 2(3), 18.
Höwer, D. (2016). The role of bank relationships when firms are financially distressed. Journal of Banking and Finance, 65, 59–75.
Hsu, M. W., Lessmann, S., Sung, M. C., Ma, T., & Johnson, J. E. (2016). Bridging the divide in financial market forecasting: Machine learners vs. financial economists. Expert Systems with Applications, 61, 215–234.
Iwanicz-Drozdowska, M., & Ptak-Chmielewska, A. (2019). Prediction of banks distress-regional differences and macroeconomic conditions. Acta Universitatis Lodziensis Folia Oeconomica, 6(345), 57–73.
Kozak, S., Nagel, S., & Santosh, S. (2020). Shrinking the cross-section. Journal of Financial Economics, 135(2), 271–292.
Kristóf, T., & Virág, M. (2022). Eu-27 bank failure prediction with c5. 0 decision trees and deep learning neural networks. Research in International Business and Finance, 61, 101–644.
Kumar, G., Rahman, M. R., Rajverma, A., & Misra, A. K. (2023). Predicting systemic risk of banks: A machine learning approach. Journal of Modelling in Management. https://doi.org/10.1108/JM2-12-2022-0288
Leo, M., Sharma, S., & Maddulety, K. (2019). Machine learning in banking risk management: A literature review. Risks, 7(1), 29.
Lown, C. S., Osler, C. L., Strahan, P. E., & Sufi, A. (2000). The changing landscape of the financial services industry: What lies ahead? FRBNY Economic Policy Review, 10, 39–55.
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765–4774.
Mama, H. B. (2017). Innovative efficiency and stock returns: Should we care about nonlinearity? Finance Research Letters, 24, 81–89.
Merton, R. C. (1974). On the pricing of corporate debt: The risk structure of interest rates. The Journal of Finance, 29, 449–70.
Milne, A. (2014). Distance to default and the financial crisis. Journal of Financial Stability, 12, 26–36. https://doi.org/10.1016/j.jfs.2013.05.005
Müller, A. C., & Guido, S. (2017). Introduction to machine learning with Python: A guide for data scientists. https://doi.org/10.1017/CBO9781107415324.004
Newey, W. K., & West, K. D. (1987). A simple, positive semi-definite, heteroscedasticity and autocorrelation consistent covariance matrix. Ecometrica, 55(3), 703–708. https://doi.org/10.2307/1913610
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Peng, Y., & Nagata, M. H. (2020). An empirical overview of nonlinearity and overfitting in machine learning using Covid-19 data. Chaos, Solitons and Fractals, 139, 110055.
Petropoulos, A., Siakoulis, V., Stavroulakis, E., & Vlachogiannakis, N. E. (2020). Predicting bank insolvencies using machine learning techniques. International Journal of Forecasting, 36(3), 1092–1113.
Ribeiro, M.T., Singh, S., & Guestrin, C. (2016). “why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135–1144).
Schiozer, R., Mourad, F., & Vilarins, R. S. (2018). Bank risk, bank bailouts and sovereign capacity during a financial crisis: A cross-country analysis. Journal of Credit Risk. https://doi.org/10.21314/jcr.2018.246
Shapley, L. S. (1953). A value for n-person games. Contributions to the Theory of Games, 2(28), 307–317.
Smith, M., & Alvarez, F. (2022). Predicting firm-level bankruptcy in the Spanish economy using extreme gradient boosting. Computational Economics, 59(1), 263–295. https://doi.org/10.1007/s10614-020-10078-2
Strumbelj, E., & Kononenko, I. (2010). An efficient explanation of individual classifications using game theory. The Journal of Machine Learning Research, 11, 1–18.
Štrumbelj, E., & Kononenko, I. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41(3), 647–665.
Tabak, B. M., Fazio, D. M., & Cajueiro, D. O. (2012). The relationship between banking market competition and risk-taking: Do size and capitalization matter? Journal of Banking and Finance, 36(12), 3366–3381. https://doi.org/10.1016/j.jbankfin.2012.07.022
Tabak, B. M., Fazio, D. M., & Cajueiro, D. O. (2013). Systemically important banks and financial stability: The case of Latin America. Journal of Banking and Finance, 37, 3855–3866. https://doi.org/10.1016/j.jbankfin.2013.06.003
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society Series B (Methodological), 58, 267–288.
Wang, C. W., Chiu, W. C., & Peña, J. I. (2017). Effect of rollover risk on default risk: Evidence from bank financing. International Review of Financial Analysis, 54, 130–143. https://doi.org/10.1016/j.irfa.2016.09.009
**aomao, X., Xudong, Z., & Yuanfang, W. (2019). A comparison of feature selection methodology for solving classification problems in finance. Journal of Physics: Conference Series, 1284, 012026.
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they do not have a financial interest or personal relationships that could influence the work carried out in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
de Moraes Souza, J.G., de Castro, D.T., Peng, Y. et al. A Machine Learning-Based Analysis on the Causality of Financial Stress in Banking Institutions. Comput Econ (2023). https://doi.org/10.1007/s10614-023-10514-z
Accepted:
Published:
DOI: https://doi.org/10.1007/s10614-023-10514-z