Abstract
Forecasting future events is a challenging task that can have a significant impact on decision-making and policy-making. In this research, we focus on forecasting news related to Pakistan. Despite the importance of accurate predictions in this field, there currently exists no dataset for forecasting Pakistani news, specifically with regards to politics. Unlike numerical time series data, textual data includes information about the event's potential causes in addition to its impact. Better forecasts are thus anticipated as a result of this greater information. In order to address this gap, our research aims to create a first Pakistani news dataset for forecasting of Pakistan news that is mostly related to politics of Pakistan. This dataset was collected from various sources, including Pakistani news websites and social media platforms, as well as frequently asked questions about Pakistani politics. We develop a forecasting model using this dataset and evaluate the effectiveness of cutting-edge deep hybrid learning techniques incorporating neural networks, random forest, Word2vec, Natural language processing (NLP), and Naive Bayes. To the best of our understanding, no research has been done on the application of a deep hybrid learning model—a blend of deep learning and machine learning—for news forecasting. The accuracy for forecasting model is 97%. According to our findings, the model's performance is adequate when compared to that of other forecasting models. Our research not only fills the gap in the current literature but also presents a new challenge for large language models and has the potential to bring significant practical advantages in the field of forecasting. The unique contribution of this study lies in the intelligent modeling of the prediction challenge, allowing for the utilization of text rich in content for forecasting objectives.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01781-6/MediaObjects/41870_2024_1781_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01781-6/MediaObjects/41870_2024_1781_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01781-6/MediaObjects/41870_2024_1781_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01781-6/MediaObjects/41870_2024_1781_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01781-6/MediaObjects/41870_2024_1781_Figa_HTML.png)
Similar content being viewed by others
References:
Petropoulos F et al (2022) Forecasting: theory and practice. Int J Forecast. https://doi.org/10.1016/j.ijforecast.2021.11.001
Armstrong JS (2001) Principles of forecasting: a handbook for researchers and practitioners. Kluwer Academic Publishers, Netherlands
Faraway J, Chatfield C (1995) Time series forecasting with neural networks: a case study, University of Bath, Bath (United Kingdom), Research Report, pp 95–06.
Makridakis S, Wheelwright SC, Hyndman RJ (2008) Forecasting methods and applications. John Wiley & Sons, New Jersey
Christensen P, Gillingham K, Nordhaus W (2018) Uncertainty in forecasts of long-run economic growth. Proc Natl Acad Sci 115(21):5409–5414
Christensen K, Davis J, Faber B (2018) Forecasting in a Changing Climate. Bus Econ 53(4):216–223. https://doi.org/10.1080/0000000x.2018.1505503
Adam D (2020) Modelling the pandemic: the simulations driving the world’s response to COVID-19. Nature 580(7803):316–318
Hendrycks D, Carlini N, Schulman J, Steinhardt J (2021) Unsolved problems in ML safety, ar**v preprint ar**v:2109.13916
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors, In: Proceedings of the 19th International Conference on World Wide Web, pp 851–860.
Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8. https://doi.org/10.1016/j.jocs.2010.12.007
Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with Twitter: what 140 characters reveal about political sentiment. ICWSM 10:178–185
Webby R, O’Connor M (1996) Judgemental and statistical time series forecasting: a review of the literature. Int J Forecast 12(1):91–118. https://doi.org/10.1016/0169-2070(95)00644-3
Makridakis S, Hyndman RJ, Petropoulos F (2020) Forecasting in social settings: the state of the art. Int J Forecast 36(1):15–28. https://doi.org/10.1016/j.ijforecast.2019.05.011
Triebe O, Hewamalage H, Pilyugina P, Laptev N, Bergmeir C, Rajagopal R (2021) NeuralProphet: explainable forecasting at scale, ar**v preprint ar**v:2111.15397
T. F. Rötheli, 2016 Book Review of Superforecasting: The Art and Science of Prediction. by Philip Tetlock and Dan Gardner, Forthcoming: Foresight, the Journal of Future Studies, Strategic Thinking, and Policy
Cohen SP (2002) The Nation and the State of Pakistan. Wash Q 25(3):109–122. https://doi.org/10.1162/01636600260046271
** W, Khanna R, Kim S, Lee DH, Morstatter F, Galstyan A, Ren X (2021) ForecastQA: a question answering challenge for event forecasting with temporal text data, In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing Vol 1: Long Papers, pp 4636–4650, https://doi.org/10.18653/v1/2021.acl-long.357.
Boschee E, Lautenschlager J, Brien SO, Shellman S, Starz J, Ward M (2015) ICEWS coded event data, Harvard dataverse, Online. Available: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/28075.
Leetaru K, Schrodt PA (2013) Gdelt: Global data on events, location, and tone, 1979–2012, in ISA Annual Convention, vol 2, pp 1–49, Citeseer
Morstatter F et al. (2019) SAGE: a hybrid geopolitical event forecasting system, In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI, Macao, China, 10–16, pp 6557–6559, ijcai.org, https://doi.org/10.24963/ijcai.2019/907.
Ramakrishnan N et al. (2014) Beating the news' with EMBERS: forecasting civil unrest using open source indicators, In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD'14, New York, NY, USA, August 24–27, pp 1799–1808, ACM, https://doi.org/10.1145/2623330.2623357
Hu L, Li J, Nie L, Li X, Shao C (2017) What happens next? future subevent prediction using contextual hierarchical LSTM, In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4–9, San Francisco, California, USA, pp 3450–3456, AAAI Press, https://doi.org/10.1609/aaai.v31i1.5435.
Li Z, Ding X, Liu T (2018) Constructing narrative event evolutionary graph for script event prediction, In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI, July 13–19, Stockholm, Sweden, pp 4201–4207, ijcai.org, https://doi.org/10.24963/ijcai.2018/583.
Ellis GW, Ge X, Grasso D (1990) Time series analysis of wastewater quality, In: Instrumentation, control and automation of water and wastewater treatment and transport systems, pp 441–448, Pergamon
Holt CC (1960) Forecasting seasonals and trends by exponentially weighted moving averages. J R Stat Soc Ser B Methodol 26(2):211–230. https://doi.org/10.1111/j.2517-6161.1960.tb00212.x
Winters PR (1960) Forecasting sales by exponentially weighted moving averages. Manage Sci 6(3):324–342. https://doi.org/10.1287/mnsc.6.3.324
Lütkepohl H (2005) New Introduction to Multiple Time Series Analysis. Springer Science and Business Media, Heidelberg
Johansen S (1995) Likelihood-based inference in cointegrated vector autoregressive models. OUP Oxford, England
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(2):1189–1232
Rumelhart DE, Hinton GE, Williams RJ (1986) "Learning internal representations by error propagation in parallel distributed processing. MIT Press, Cambridge
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Makridakis S, Wheelwright SC (1989) Forecasting: Methods and Applications. John Wiley & Sons, New Jersey
Januschowski T et al. (2020) Global Forecasting Models for Time Series
Oreshkin BN et al. (2020) N-BEATS: neural basis expansion analysis for interpretable time series forecasting, https://doi.org/10.1145/3447548.3447554.
Zoph B (2018) Learning transferable architectures for scalable image recognition.
Hewamalage H (2021) Deep learning techniques for time series forecasting.
Wen R et al. (2017) A dual-stage attention-based recurrent neural network for time series prediction.
Cho K et al. (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation, https://doi.org/10.3115/v1/D14-1179.
Flunkert V et al. (2020) DeepAR: probabilistic forecasting with autoregressive recurrent networks.
Bai S et al. (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, https://doi.org/10.1109/ICDM.2018.00131.
van den Oord A et al. (2016) WaveNet: a generative model for raw audio, https://doi.org/10.5555/3045390.3045555.
Vaswani A et al. (2017) Attention is all you need, https://doi.org/10.5555/3295222.3295349.
Lim E et al. (2021) Temporal fusion transformer for time series forecasting.
Brown TB et al. (2020) Language models are few-shot learners, ar**v preprint ar**v:2005.14165.
Gokaslan A, Cohen WW (2019) WebText: a large text corpus for pre-training text generators, ar**v preprint ar**v:1912.05403
Tetlock PE, Gardner D (2016) Superforecasting: The Art and Science of Prediction. Broadway Books, New York
Chen Y et al. (2021) Retrieval-guided neural conversation generation, ar**v preprint ar**v:2103.11729.
Shuster M et al. (2021) A large-scale evaluation of language models, ar**v preprint ar**v:2101.08667.
Lin Y et al. (2021) Faked news: identifying and mitigating the spread of misinformation in microblogs, In: Proceedings of the 20th International Conference on World Wide Web.
Hendrycks D et al. (2021) A baseline for detecting misconceptions in pre-trained language models, ar**v preprint ar**v:2102.05158.
Bai Y et al. (2022) Fine-tuning pre-trained language models for fact-checking, ar**v preprint ar**v:2103.05202.
Nakano R et al. (2021) Fact extraction and verification using pre-trained language models, In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
Hadfield-Menell D et al. (2016) The offswitch game: a formal testbed for investigating corrigibility, ar**v preprint ar**v:1610.08517.
Turner RM et al. (2020) OpenAI's GPT-3: a 10x larger language model, ar**v preprint ar**v:2005.14165.
Wainwright MJ, Eckersley P (2019) The challenges of ai alignment. J Artif Intell Res 64:727–753. https://doi.org/10.1613/jair.1.11303
Irving G et al. (2018) AI for human-robot interaction, In: Proceedings of the IEEE International Conference on Robotics and Automation.
Evans R et al. (2021) AI alignment: a research agenda, ar**v preprint ar**v:2103.09453.
Leike J et al. (2017) AI alignment: theories and methods, ar**v preprint ar**v:1705.08807.
Hendrycks D et al. (2021) Pre-trained language models as provenance-aware programs, ar**v preprint ar**v:2104.05385.
Reddy S et al. (2020) AIAI: AI alignment via interventions, ar**v preprint ar**v:2010.08622.
Nahian R et al. (2021) AI alignment: a survey of methods, ar**v preprint ar**v:2104.05382.
Zhai S, Zhang Z (2023) Read the news, not the books: forecasting firms’ long-term financial performance via deep text mining. ACM Trans Manag Inf Syst 14(1):37. https://doi.org/10.1145/3533018
Liu M, Ying Q (2023) The role of online news sentiment in carbon price prediction of China’s carbon markets. Environ Sci Pollut Res 30:41379–41387. https://doi.org/10.1007/s11356-023-25197-0
Mao Q, Li X, Peng H, Li J, He D, Guo S et al (2022) Event prediction based on evolutionary event ontology knowledge. Futur Gener Comput Syst 115:76–89. https://doi.org/10.1016/j.future.2020.08.046
Radinsky K, Horvitz E (2013) Mining the web to predict future events, In: ACM international conference on web search and data mining, pp 255–264, https://doi.org/10.1145/2433396.2433431.
Barbaglia L, Consoli S, Manzan S (2023) Forecasting with Economic News. J Bus Econ Stat. https://doi.org/10.1080/07350015.2022.2060988
Pan D, Zhang C, Zhu D et al (2023) Carbon price forecasting based on news text mining considering investor attention. Environ Sci Pollut Res 30:28704–28717. https://doi.org/10.1007/s11356-022-24186-z
Lunde A, Torkar M (2020) Including news data in forecasting macroeconomic performance of China. CMS 17:585–611. https://doi.org/10.1007/s10287-020-00382-5
Awais M, Hassan SU, Ahmed M (2021) Leveraging big data for politics: predicting general election of Pakistan using a novel rigged model. J Ambient Intell Humaniz Comput 12:4305–4313. https://doi.org/10.1007/s12652-019-01378-z
Singh P, Dwivedi YK, Kahlon KS, Pathania A, Sawhney RS (2020) Can Twitter analytics predict election outcome? An insight from 2017 Punjab assembly elections. Gov Inf Q 37(2):101444
FronzettiColladon A, Grippa F, Guardabascio B et al (2023) Forecasting consumer confidence through semantic network analysis of online news. Sci Rep 13:11785. https://doi.org/10.1038/s41598-023-38400-6
Wang Y, Bi Z, Ji S, Xu W (2019) Multi-dimensional news forecasting with recurrent neural networks, In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 1064-1071
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ihsan, R., Khurshid, S.K., Shoaib, M. et al. A technique to forecast Pakistan’s news using deep hybrid learning model. Int. j. inf. tecnol. 16, 2505–2516 (2024). https://doi.org/10.1007/s41870-024-01781-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-024-01781-6