Abstract
In an increasingly complex and volatile environment, government officials, researchers, and investors alike would like to possess models that accurately forecast markets in order to make appropriate decisions. This research investigates the efficacy of Transformer-based deep neural networks in predicting financial market returns compared to traditional models, focusing on ten different market indexes. The study employs a comprehensive methodology that involves iterative dropout tests and batch size optimization to enhance model performance. By leveraging the power of deep learning, the research aims to improve prediction accuracy and capture complex patterns in stock market data. Twelve neural network architectures are compared across ten indexes to measure their performance, finding that the proposed Transformer variants produce significantly better results compared to benchmark models in all cases. The results of ablative experiments reveal the superiority of Transformer models in capturing long-term dependencies and extracting meaningful features from time series data. The findings suggest that Transformer neural networks outperform LSTM networks and other traditional models in forecasting financial market trends. This research contributes to the growing body of literature on deep learning applications in finance and provides valuable insights for government officials, researchers, and investors seeking to make informed decisions in the stock market. The implications of this study extend beyond academia, offering practical implications for enhancing prediction accuracy and optimizing investment strategies in the dynamic and volatile financial market landscape.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09931-4/MediaObjects/521_2024_9931_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09931-4/MediaObjects/521_2024_9931_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09931-4/MediaObjects/521_2024_9931_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09931-4/MediaObjects/521_2024_9931_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09931-4/MediaObjects/521_2024_9931_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09931-4/MediaObjects/521_2024_9931_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09931-4/MediaObjects/521_2024_9931_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09931-4/MediaObjects/521_2024_9931_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09931-4/MediaObjects/521_2024_9931_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09931-4/MediaObjects/521_2024_9931_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09931-4/MediaObjects/521_2024_9931_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-024-09931-4/MediaObjects/521_2024_9931_Fig11_HTML.png)
Similar content being viewed by others
Data availability
The data will be made available on reasonable request.
References
Rezaei H, Faaljou H, Mansourfar G (2020) Stock price prediction using deep learning and frequency decomposition. Expert Syst Appl 169:114332. https://doi.org/10.1016/j.eswa.2020.114332
Huang N, Wu ML, Qu W et al (2003) Application of Hilbert–Huang transform to non-stationary financial time series analysis. Appl Stoch Model Bus Ind 19:245–268. https://doi.org/10.1002/asmb.501
Siami Namini S, Tavakoli N, Siami Namin A (2018) A comparison of ARIMA and LSTM in forecasting time series. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA) pp 1394–1401. https://doi.org/10.1109/ICMLA.2018.00227
Selvin S, Ravi V, Gopalakrishnan E, et al (2017) Stock price prediction using LSTM, RNN and CNN-sliding window model. In: 2017 International conference on advances in computing, communications and informatics (ICACCI) pp 1643–1647. https://doi.org/10.1109/ICACCI.2017.8126078
Rhanoui M, Yousfi S, Mikram M et al (2019) Forecasting financial budget time series ARIMA random walk vs LSTM neural network. IAES Int J Artif Intell 8:317. https://doi.org/10.11591/ijai.v8.i4.pp317-327
Güreşen E, Kayakutlu G, Daim T (2011) Using artificial neural network models in stock market index prediction. Expert Syst Appl 38:10389–10397. https://doi.org/10.1016/j.eswa.2011.02.068
Roh T (2007) Forecasting the volatility of stock price index. Expert Syst Appl 33:916–922. https://doi.org/10.1016/j.eswa.2006.08.001
Zhang Y, Li C, Jiang Y et al (2022) Accurate prediction of water quality in urban drainage network with integrated EMD-LSTM model. J Clean Prod 354:131724. https://doi.org/10.1016/j.jclepro.2022.131724
Yang G, Yuan E (2022) Predicting the long-term co2 concentration in classrooms based on the BO-EMD-LSTM model. Build Environ 224:109568. https://doi.org/10.1016/j.buildenv.2022.109568
Lin Y, Lin Z, Liao Y et al (2022) Forecasting the realized volatility of stock price index: a hybrid model integrating CEEMDAN and LSTM. Expert Syst Appl 206:117736. https://doi.org/10.1016/j.eswa.2022.117736
Ran P, Dong K, Liu X et al (2023) Short-term load forecasting based on CEEMDAN and transformer. Electr Power Syst Res 214:108885. https://doi.org/10.1016/j.epsr.2022.108885
Cao J, Li Z, Li J (2018) Financial time series forecasting model based on CEEMDAN and LSTM. Physica A: Stat Mech Appl 519:127–139. https://doi.org/10.1016/j.physa.2018.11.061
Topcu I, Saridemir M (2009) Prediction of compressive strength of concrete containing fly ash using artificial neural networks and fuzzy logic. Comput Mater Sci 41:305–311. https://doi.org/10.1016/j.commatsci.2007.04.009
Mohtasham Moein M, Saradar A, Rahmati K et al (2022) Predictive models for concrete properties using machine learning and deep learning approaches: a review. J Build Eng 63:105444. https://doi.org/10.1016/j.jobe.2022.105444
Sezer O, Gudelek U, Ozbayoglu M (2020) Financial time series forecasting with deep learning: a systematic literature review: 2005–2019. Appl Soft Comput 90:106181. https://doi.org/10.1016/j.asoc.2020.106181
Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: a tutorial. Computer 29(3):31–44
Elman J (1990) Finding structure in time. Cogn Sci 14:179–211. https://doi.org/10.1016/0364-0213(90)90002-E
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5:157–66. https://doi.org/10.1109/72.279181
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–80. https://doi.org/10.1162/neco.1997.9.8.1735
Fischer T, Krauss C (2017) Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res 270:654–669. https://doi.org/10.1016/j.ejor.2017.11.054
Junaid T, Sumathi D, Sasikumar AN et al (2022) A comparative analysis of transformer based models for figurative language classification. Comput Electr Eng 101:108051. https://doi.org/10.1016/j.compeleceng.2022.108051
Playout C, Duval R, Boucher M et al (2022) Focused attention in transformers for interpretable classification of retinal images. Med Image Anal 82:102608. https://doi.org/10.1016/j.media.2022.102608
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Wolf T, Debut L, Sanh V, et al (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp 38–45
Leow EKW, Nguyen BP, Chua MCH (2021) Robo-advisor using genetic algorithm and BERT sentiments from tweets for hybrid portfolio optimisation. Expert Syst Appl 179:115060
de Oliveira Carosia AE, Coelho GP, da Silva AEA (2021) Investment strategies applied to the Brazilian stock market: a methodology based on sentiment analysis with deep learning. Expert Syst Appl 184:115470
Gao Y, Zhao C, Sun B et al (2022) Effects of investor sentiment on stock volatility: new evidences from multi-source data in china’s green stock markets. Financ Innov 8(1):1–30
Huang N, Shen Z, Long S et al (1998) The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proce R Soc London Series A: Math Phys Eng Sci 454:903–995. https://doi.org/10.1098/rspa.1998.0193
Erdiş A, Bakir M, Jaiteh M (2021) A method for detection of mode-mixing problem. J Appl Stat 48:1–17. https://doi.org/10.1080/02664763.2021.1908969
Wu Z, Huang N (2009) Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv Adapt Data Anal 1:1–41. https://doi.org/10.1142/S1793536909000047
Torres ME, Colominas M, Schlotthauer G, et al (2011) Complete ensemble empirical mode decomposition with adaptive noise. In: ICASSP, IEEE international conference on acoustics, speech and signal processing - proceedings pp 4144–4147. https://doi.org/10.1109/ICASSP.2011.5947265
Colominas M, Schlotthauer G, Torres ME (2014) Improved complete ensemble EMD: a suitable tool for biomedical signal processing. Biomed Signal Process Control 14:19–29. https://doi.org/10.1016/j.bspc.2014.06.009
Hota H, Handa R, Shrivas A (2017) Time series data prediction using sliding window based RBF neural network. Int J Comput Intell Res 13(5):1145–1156
Chu CS (1995) Time series segmentation: a sliding window approach. Inf Sci 85:147–173. https://doi.org/10.1016/0020-0255(95)00021-G
Yann L, Bottou L, Bengio Y et al (1986) Gradientbased learning applied to document recognition. Proc IEEE 11:2278–2324
Guo T, Dong J, Li H, et al (2017) Simple convolutional neural network on image classification. In: 2017 IEEE 2nd international conference on big data analysis (ICBDA) pp 721–724. https://doi.org/10.1109/ICBDA.2017.8078730
Kazemi M, Goel R, Eghbali S, et al (2019) Time2vec: Learning a vector representation of time. ar**v preprint ar**v:1907.05321
Dubey S, Singh S, Chaudhuri B (2022) Activation functions in deep learning: a comprehensive survey and benchmark. Neurocomputing 503:92–108. https://doi.org/10.1016/j.neucom.2022.06.111
Glorot X, Bordes A, Bengio Y (2010) Deep sparse rectifier neural networks. J Mach Learn Res 15
Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. ar**v preprint ar**v:1710.05941
Hendrycks D, Gimpel K (2016) Gaussian error linear units (GELUS). ar**v preprint ar**v:1606.08415
Lillicrap T, Santoro A, Marris L et al (2020) Backpropagation and the brain. Nat Rev Neurosci 21:335–346. https://doi.org/10.1038/s41583-020-0277-3
Wang Q, Ma Y, Zhao K et al (2022) A comprehensive survey of loss functions in machine learning. Annals Data Sci 9:187–212. https://doi.org/10.1007/s40745-020-00253-5
Dozat T (2016) Incorporating nesterov momentum into adam
Hardt M, Recht B, Singer Y (2016) Train faster, generalize better: stability of stochastic gradient descent. In: International conference on machine learning pp 1225–1234
Vani S, Rao T (2019) An experimental approach towards the performance assessment of various optimizers on convolutional neural network. In: 2019 3rd international conference on trends in electronics and informatics (ICOEI) pp 331–336. https://doi.org/10.1109/ICOEI.2019.8862686
Hsueh BY, Li W, Wu IC (2019) Stochastic gradient descent with hyperbolic-tangent decay on classification. In: 2019 IEEE winter conference on applications of computer vision (WACV) pp 435–442
Hansen P, Nason J, Lunde A (2010) The model confidence set. Econometrica 79:453–497. https://doi.org/10.2139/ssrn.522382
Chollet F, et al (2015) Keras. https://github.com/fchollet/keras
Koprinkova-Hristova P, Petrova M (1999) Data-scaling problems in neural-network training. Eng Appl Artif Intell 12:281–296. https://doi.org/10.1016/S0952-1976(99)00008-1
Liu T, Luo Z, Huang J et al (2018) A comparative study of four kinds of adaptive decomposition algorithms and their applications. Sensors 18:2120. https://doi.org/10.3390/s18072120
Ying X (2019) An overview of overfitting and its solutions. J Phys: Conf Ser 1168:022022. https://doi.org/10.1088/1742-6596/1168/2/022022
Chen S, Ge L (2019) Exploring the attention mechanism in LSTM-based Hong Kong stock price movement prediction. Quant Finance 19:1–9. https://doi.org/10.1080/14697688.2019.1622287
Yu X, Feng Wz, Wang H et al (2020) An attention mechanism and multi-granularity-based bi-LSTM model for Chinese Q &A system. Soft Comput 24:5831–5845. https://doi.org/10.1007/s00500-019-04367-8
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Consent to publish
The authors agreed with the content and gave explicit consent to submit the manuscript.
Human and animal rights
The study does not involve Human Participants. The study does not involve Animals.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yañez, C., Kristjanpoller, W. & Minutolo, M.C. Stock market index prediction using transformer neural network models and frequency decomposition. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09931-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00521-024-09931-4