Abstract
Extreme financial risk prediction is an important component of risk management in financial markets. In this study, taking the China Securities Index 300 (CSI300) as an example, we set out to introduce the kernel method into fuzzy c-mean algorithm (FCM) and synthetic minority over-sampling technique (SMOTE) and combine them with support vector machine (SVM) to propose a hybrid model of KFCM-KSMOTE-SVM for predicting extreme financial risks, which is compared with other various prediction models. In addition, we investigate the influence on the prediction performance of KFCM-KSMOTE-SVM exerted by its parameters. The empirical results present that KFCM-KSMOTE-SVM outperforms other various prediction models significantly, which verifies that KFCM-KSMOTE-SVM can solve the class imbalance problem in financial markets and is more appropriate for predicting extreme financial risks. Meanwhile, parameter set plays an important role in constructing KFCM-KSMOTE-SVM prediction model. Besides, the experiment on Shanghai Stock Exchange Composite Index also proves that KFCM-KSMOTE-SVM has strong robustness on predicting extreme financial risks.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-020-09975-3/MediaObjects/10614_2020_9975_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-020-09975-3/MediaObjects/10614_2020_9975_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-020-09975-3/MediaObjects/10614_2020_9975_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-020-09975-3/MediaObjects/10614_2020_9975_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-020-09975-3/MediaObjects/10614_2020_9975_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-020-09975-3/MediaObjects/10614_2020_9975_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-020-09975-3/MediaObjects/10614_2020_9975_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-020-09975-3/MediaObjects/10614_2020_9975_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-020-09975-3/MediaObjects/10614_2020_9975_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10614-020-09975-3/MediaObjects/10614_2020_9975_Fig10_HTML.png)
Similar content being viewed by others
References
Ahn, J. J., Oh, K. J., Kim, T. Y., & Kim, D. H. (2011). Usefulness of support vector machine to develop an early warning system for financial crisis. Expert Systems with Applications,38, 2966–2973.
Altman, E. I. (1968). Financial ratios discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance,23, 589–609.
Barandela, R., Ferri, F. J., & Sanchez, J. S. (2005). Decision boundary preserving prototype selection for nearest neighbor classification. International Journal of Pattern Recognition and Artificial Intelligence,19(6), 787–806.
Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.
Chang, C. C., & Lin, C. J. (2013). LIBSVM: A library for support vector machines. Software available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research,16, 321–357.
Chawla, N. V., Cieslas, D. A., Hall, L. O., & Joshi, A. (2008). Automatically countering imbalance and its empirical relationship to cost. Data Mining and Knowledge Discovery,17(2), 225–252.
Cohen, G., Hilario, M., Sax, H., Hogonnet, S., & Geissbuhler, A. (2006). Learning from imbalanced data in surveillance of nosocomial infection. Artificial Intelligence in Medicine,37, 7–18.
Cumperayot, P., & Kouwenberg, R. (2013). Early warning systems for currency crises: A multivariate extreme value approach. Journal of International Money and Finance,36, 151–171.
Daoud, A. E., & Turabieh, H. (2013). New empirical nonparametric kernels for support vector machine classification. Applied Soft Computing,13(4), 1759–1765.
DuMouchel, W. H. (1983). Estimating the stable index α in order to measure tail thickness: A critique. Annals of Statistics,11(4), 1019–1031.
Fernandez, A. D., Lopez, M. M., Montero, T. R., & Martinez, F. E. (2018). Financial soundness prediction using a multi-classification model: Evidence from current financial crisis in OECD banks. Computational Economics,52, 275–297.
Gao, M., Hong, X., Chen, S., & Harris, C. J. (2011). A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems. Neurocomputing,74, 3456–3466.
Garcia, S., Derrac, J., Triguero, I., Carmona, C. J., & Herrera, F. (2012a). Evolutionary-based selection of generalized instances for imbalanced classification. Knowledge-Based Systems,25(1), 3–12.
Garcia, V., Sanchez, J. S., & Mollineda, R. A. (2012b). On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowledge-Based Systems,25, 13–31.
Gower, J. C. (1968). Adding a point to vector diagrams in multivariate analysis. Biometrika,55(3), 582–585.
Guillermo, S. B., Juan, F. S., & Ignacio, V. R. (2015). Volatility forecasting using support vector regression and a hybrid genetic algorithm. Computational Economics,45, 111–133.
Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. The Annals of Statistics,3(5), 1163–1174.
Ho, D., & Mauceri, C. (2007). Clustering by kernel density. Computational Economics,29(2), 199–212.
Hu, Y. X., & Li, X. B. (2012). Bayes discriminant analysis method to identify risky of complicated goaf in mines and its application. Transactions of Nonferrous Metals Society of China,22(2), 425–431.
Japkowicz, N. (2000). The class imbalance problem: Significance and strategies. In Proceedings of the 2000 International Conference on Artificial Intelligence (pp. 111-117).
**, X. Y., Lin, L., Zhong, S. S., & Ding, G. (2011). Rotor fault analysis of classification accuracy optimition base on kernel principal component analysis and SVM. Procedia Engineering,15, 5279–5283.
Kia, A. N., Haratizadeh, S., & Shouraki, S. B. (2018). A hybrid supervised semi-supervised graph-based model to predict one-day ahead movement of global stock markets and commodity prices. Expert Systems with Applications,105, 159–173.
Kim, T. Y., Hwang, C., & Lee, J. (2004a). Korean economic condition indicator using a neural network trained on the 1997 crisis. Journal of Data Science,2, 371–381.
Kim, T. Y., Oh, K. J., Sohn, I., & Hwang, C. (2004b). Usefulness of artificial neural networks for early warning system of economic crisis. Expert Systems with Applications,26, 583–590.
Kole, E., Koedijk, K., & Verbeek, M. (2007). Selecting copulas for risk management. Journal Banking & Finance,31, 2405–2423.
Kubat, M., & Matwin, S. (1997).Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the 14th International Conference on Machine Learning (pp. 179–186), Nashville, USA.
Kwok, J. T., & Tsang, I. W. (2004). The pre-image problem in kernel methods. IEEE Transactions on Neural Networks,15(6), 1517–1525.
Laurikkala, J. (2001). Improving identification of difficult small classes by balancing class distribution. In: Proceedings of the 8th Conference on Artificial Intelligence in Medicine (pp. 63–66), Cascais, Portugal.
Liu, Y. H., Lin, S. H., Hsueh, Y. L., & Lee, M. J. (2009). Automatic target defect identification for TFT-LCD array process inspection using kernel FCM-based fuzzy SVDD ensemble. Expert Systems with Applications,36, 1978–1998.
Long, W., Lu, Z. C., & Cui, L. X. (2019). Deep learning-based feature engineering for stock price movement prediction. Knowledge-Based Systems,2019(164), 163–173.
McNeil, A. J., & Frey, R. (2000). Estimation of tail-related risk measures for heteroscedastic financial time series: An extreme value approach. Journal of Empirical Finance,7, 271–300.
Neftci, S. (2000). Value at Risk calculations, extreme events and tail estimation. The Journal of Derivations, Spring,1, 1–15.
Ohlson, J. (1980). Financial ratios and probabilistic prediction of bankruptcy. Journal of Accounting Research,18, 109–131.
Peng, Y., Wang, G. X., Kou, G., & Shi, Y. (2011). An empirical study of classification algorithm evaluation for financial risk prediction. Applied Soft Computing,11, 2906–2915.
Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning,42(3), 203–231.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning,1, 81–106.
Saini, I., Singh, D., & Khosla, A. (2013). QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases. Journal of Advanced Research,4(4), 331–344.
Sanchez, A. D. A. (2003). Advanced support vector machines and kernel methods. Neurocomputing,55(1), 5–20.
Sokolova, M., Japkowicz, N., & Szpakowicz, S. (2006). Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. In Proceedings of the 19th ACS Australian Joint Conference on Artificial Intelligence (pp. 1015–1021), Hobart, Australia.
Stelios, D. B., & Dimitris, A. G. (2005). Estimation of Value-at-Risk by extreme value and conventional methods: A comparative evaluation of their predictive performance. Int. Fin. Markets, Inst and Money,15, 209–228.
Tam, K. Y., & Kiang, M. (1992). Predicting bank failures: A neural network approach. Management Science,8, 926–947.
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
Wang, J., Li, X. B., Cui, T., & Yang, J. L. (2011). Application of distance discriminant analysis method to headstream recognition of water-bursting source. Procedia Engineering,26, 374–381.
Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco: Morgan Kaufmann.
Wu, Z. D., & **e, W. X. (2003).Fuzzy-c-means clustering algorithm based on kernel method. In Proceedings of the 5th International Conference on Computational Intelligence and Multimedia Applications (pp. 49–54).
Wu, X. H., & Zhou, J. J. (2006). Fuzzy discriminant analysis with kernel methods. Pattern Recognition,39(11), 2236–2239.
Yen, S. J., Lee, Y. S., Lin, C. H., & Ying, J. C. (2006).Investigating the effect of sampling methods for imbalanced data distributions. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (pp. 4163–4168), Taipei, Taiwan.
Zbikowski, K. (2015). Using volume weighted support vector machines with walk forward testing and feature selection for the purpose of creating stock trading strategy. Expert Systems with Applications,2015(42), 1797–1805.
Zmijewski, M. E. (1984). Methodological issues related to the estimated of financial distress prediction models. Journal of Accounting Research,22(1), 59–82.
Acknowledgements
We thank several anonymous referees for their comments and suggestions, which helped us improve the quality of this paper. This research was supported by the National Social Science Fund of China (Grant No.15CGL029) led by Professor Jia Yuan from Chengdu Institute of Public Administration.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: The Process of Kernel Map** in KSMOTE
Appendix A: The Process of Kernel Map** in KSMOTE
Given positive instances \( x_{q} \) and \( x_{a} \) where \( q = 1,2, \ldots ,\left| P \right| \) and \( a = 1,2, \ldots ,\left| P \right| \), in order to find \( k \) nearest neighbors of \( x_{q} \), the distance between \( x_{q} \) and \( x_{a} \) in the high dimensional feature spaces given by:
Formula (30) can be simplified by substituting formula (8) to formula (29):
where the distance \( d_{qa} \) is converted into \( + \infty \) when \( q = a \) and all \( d_{qa} \) construct a \( \left| P \right| \times \left| P \right| \) matrix, every row of which is denoted as the distance \( d_{q} \) between \( x_{q} \) and all instances of \( x_{a} \). For \( d_{q} \),all instances of \( x_{a} \) are sorted by \( d_{q} \) in ascending order row by row, which constitutes a new instance set \( S \) being a \( \left| P \right| \times \left| P \right| \) matrix. Finally, the first \( k \) instances from each row of \( S \) are selected as \( k \) nearest neighbors of \( x_{q} \) in the high dimensional feature space, which consists of a nearest neighbor set \( D = \left\{ {x_{v}^{q} ,v = 1,2, \ldots ,k} \right\} \) of \( x_{q} \) in the vector space.
In addition, in order to find original instances corresponding to generated instances, the relationship of distance between the vector space and the high dimensional feature space should be obtained firstly.
For the nearest neighbor set \( D = \left\{ {x_{v}^{q} ,v = 1,2, \ldots ,k} \right\} \) of \( x_{q} \), the distance between \( x_{v}^{q} \) in the high dimensional feature space and \( O_{j}^{q} \) is given by:
Simultaneously, the distance between \( x_{v}^{q} \) and the original instance \( u_{j}^{q} \) in the vector space of \( O_{j}^{q} \) in the high dimensional feature space is obtained by:
and hence,
From formula (33), the relationship of distance between the vector space and the high dimensional feature space can be obtained. Moreover, because \( d_{q}^{2} \left( {\varPhi (u_{j}^{q} ),\varPhi (x_{v}^{q} )} \right) = d_{q}^{2} \left( {O_{j}^{q} ,\varPhi (x_{v}^{q} )} \right) \), formula (34) can be simplified by substituting formula (31) to formula (33):
Generally, the distance between an instance and its nearest neighbors plays an important role in the process of locating the instance. Therefore, the vector \( d^{2} \) is denoted as the distance between the original instance \( u_{j}^{q} \) of \( O_{j}^{q} \) and the nearest neighbor set \( D = \left\{ {x_{v}^{q} ,v = 1,2, \ldots ,k} \right\} \) of \( x_{q} \) is shown as follows:
where \( d_{v}^{2} ,v = 1,2, \ldots ,k \) is the distance between \( u_{j}^{q} \) and nearest neighbors \( x_{v}^{q} \) in the vector space.
In the light of Kwok and Tsang (2004) and Gower (1968) that the coordinate of an instance is defined by the distance constraint between the instance and other instances, the original instance \( u_{j}^{q} \) of \( O_{j}^{q} \) can be located. For \( D = \left\{ {x_{v}^{q} ,v = 1,2, \ldots ,k} \right\} \), the mean value \( \bar{x} = (1/k)\sum\nolimits_{v = 1}^{k} {x_{v}^{q} } \) of original instances \( \left\{ {x_{1}^{q} ,x_{2}^{q} , \ldots ,x_{k}^{q} } \right\} \) corresponding to \( k \) nearest neighbors \( \left\{ {\varPhi (x_{1}^{q} ),\varPhi (x_{2}^{q} ), \ldots ,\varPhi (x_{k}^{q} )} \right\} \) of \( x_{q} \) in the high dimensional feature space is denoted as the centroid of \( \left\{ {x_{1}^{q} ,x_{2}^{q} , \ldots ,x_{k}^{q} } \right\} \) and a new coordinate system can be defined.
First, construct a matrix \( X_{v}^{q} = \left[ {x_{1}^{q} ,x_{2}^{q} , \ldots ,x_{k}^{q} } \right] \) and a \( k \times k \) centering matrix \( H \) given by:
where \( I \) is a \( k \times k \) identity matrix, \( L = \left[ {1,1, \ldots ,1} \right]^{T} \) is a \( k \times 1 \) vector. So \( X_{v}^{q} H \) is a centering matrix with \( \bar{x} \)-centered as follows:
Assuming that the rank of the centering matrix \( X_{v}^{q} H \) is \( p \), the singular value decomposition (SVD) of \( X_{v}^{q} H \) can be obtained as:
where \( U_{1} = \left[ {e_{1} ,e_{2} , \ldots ,e_{p} } \right] \) is a matrix with orthonormal columns \( e_{i} \),\( i = 1,2, \ldots ,p \), \( \varGamma = \varLambda_{1} V_{1}^{T} = \left[ {c_{1} ,c_{2} , \ldots ,c_{k} } \right] \) is a matrix with columns \( c_{v} \) being the projections of \( x_{v}^{q} - \bar{x} \) onto the \( U_{1} \)’s with \( \left\| {c_{v} } \right\|^{2} = \left\| {x_{v}^{q} - \bar{x}} \right\|^{2} \). Again, a \( k \times 1 \) vector \( d_{c}^{2} = \left[ {\left\| {c_{1} } \right\|^{2} ,\left\| {c_{2} } \right\|^{2} , \ldots ,\left\| {c_{k} } \right\|^{2} } \right]^{T} \) is obtained. Obviously, for the sake of gaining the approximate original instance \( u_{j}^{q} \), the distance \( d_{q}^{2} (u_{j}^{q} ,x_{v}^{q} ) \) is as close to those values obtained in formula (35) as possible, i.e.,
Then, define \( c \in R^{p \times 1} \) with \( U_{1} c = u_{j}^{q} - \bar{x} \), so
Considering that the centering matrix \( X_{v}^{q} H \) can make the cumulative of the inner product zero in formula (40), formula (41) can be simplified by the cumulative of formula (40):
Substitute for \( \left\| c \right\|^{2} \) in formula (40) with formula (41):
Formula (42) is transformed into the matrix form satisfying:
Now, \( \varGamma LL^{T} = 0 \) because \( \varGamma \) is the centering matrix. Hence, formula (43) is transformed further into:
At last, \( u_{j}^{q} \) can be obtained by transforming \( c \) in formula (44) back to the original coordinate system in the vector space:
After obtaining \( u_{j}^{q} \) in step8 via above, synthetic positive instances \( u_{j}^{q} \) can be added to original positive instance set \( P = \{ x_{q} ,q = 1,2, \ldots ,\left| P \right|\} \) and a new positive instance set \( P_{new} \) is gained.
Rights and permissions
About this article
Cite this article
Huang, X., Zhang, CZ. & Yuan, J. Predicting Extreme Financial Risks on Imbalanced Dataset: A Combined Kernel FCM and Kernel SMOTE Based SVM Classifier. Comput Econ 56, 187–216 (2020). https://doi.org/10.1007/s10614-020-09975-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-020-09975-3