Abstract
Several researchers have used standard time series models to analyze future patterns of COVID-19 and the Causal impact of vaccinations in various countries. Bayesian structural time series (BSTS) and ARIMA (Autoregressive Integrated Moving Average) models are used to forecast time series. The goal of this study is to look at a much more adaptable effective methodology for dissecting the major components of a time series that breaks down the main parts of a time series. Within the period of March 1, 2020, to June 30, 2021, we used these state space model to explore the forecast patterns of COVID-19 in five afflicted nations.In addition, we used intervention analysis under BSTS models to examine the casual effect of vaccines in these countries, and we reached higher levels of accuracy than ARIMA models. According to forecasts, the number of confirmed cases in the United States, the United Kingdom, the United Arab Emirates, Bahrain, and India will climb by 1.17%, 19.4%, 15.5%, 13.8% , and 8%, respectively, during the next 60 days. On the other side, death rates in the United States, the United Kingdom, the United Arab Emirates, Bahrain, and India are expected to rise by 2.7%, 3.5%, 15.8%, 9.4%, and 14.8%, respectively. In addition, By using effective and quick vaccination, the United States, United Kingdom, and UAE have been able to reduce the number of mortality. On the other hand, vaccination is currently unable to decrease the rate of cases and deaths in India. Overall, the Indian healthcare system is likely to be seriously over-burdened in the next month. Though the USA and UK have managed to cut down the rates of COVID-19 deaths,but in UK and UAE number of confirmed cases are high as compared to other nations,so serious efforts will be required to keep these controllable. On the other hand,To keep things under control, Bahrain and four other countries has to speed up vaccinations.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this era of big data, Analysts use the data and assist the government in making decisions as to the availability of government-provided health, agricultural, energy, tourism, and economic data [1] has increased [2]. The Internet of Things revolves around real-time decision making; it’s all about decision informatics and embraces Big Data’s advanced technology [3]. Many methods are used to analyze, optimize, and forecast large data [4]. COVID-19 creates a global emergency that affects many aspects of life, including health, economics, and politics [5]. It’s an infectious disease caused by a coronavirus. Most people infected with the COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people and people with underlying medical problems such as cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illnesses. Although the COVID-19 has low mortality rates as compared to severe acute respiratory syndrome (SARS) and the Middle East respiratory syndrome (MERS), this virus has a higher transmissibility [6]. Considering that COVID-19 trends are unknown and its end is also uncertain. Where specific information and computing resources are available, mechanical models can be sufficient to predict coronavirus infection patterns and more accurately model the impact of different intervention strategies to inform decision-makers and health care workers [7]. Many studies deal in the next trends of the epidemic in many regions. Like in Nigeria [8], IRAN [9], USA [10] and [11]. These papers used traditional methods to forecast the pandemic’s future behavior, such as traditional ARIMA and regression models. These models face the problem of over-adjustment, especially where covariates are present [12]. Time series evaluation is regularly used to do calls for forecasting, which calls for information on seasonality and trend, in addition to regression components. For small data, the researcher wishes to estimate those additives with proper precision is hard the usage of traditional time series methods.
Bayesian structural time series (BSTS) models are a viable option due to [13] having a number of intriguing features. These models can accommodate a large number of covariates and properly reflect stochastic behavior by allowing model parameters to fluctuate over time [14, 15]. The Bayesian technique has the advantage of placing previous beliefs on the parameters, which is an advantage over the classical approach. This method is extra evident than ARIMA models and offers uncertainty in an extra fashionable manner. It is greater obvious due to the fact that its illustration does now no longer depend upon differencing, lags, and shifting averages. These models could be useful in setting priorities in public health, growing, and imposing regulations to address and keep away from the negative health situations [16]. These models have already been used to predict the fitness consequences of alcohol consumption. [17] and to predict the negative effects on health and rate of crime as a result of local alcoholic licensing regulations [18]. It is likewise feasible to pick out the proper variables through Spike and Slab priors by the use of those models [19]. The BSTS models predicted future health consequences from alcohol consumption better than ARIMA. Despite the fact that the training dataset only contains eight data points, they were able to construct a reasonably accurate 1-5 year estimate [17].
BSTS models, in a nutshell, are stochastic state-space models that may examine trend, seasonality, and regression components individually. The spike and slab priors are utilized in these models to choose appropriate covariates, The final projections are produced using Bayesian model averaging. The estimates from these models are the least dependent on particular assumptions.Using these models, the model parameters are manually weighted depending on their inclusion probabilities. Linear regression component of the model is not required in these models since they select the most informative parameters. These models improve the depiction of estimate certainty and change over time [17]. Analytically computing the Bayesian posterior distribution is, however, rather complex. As a result, mathematical calculations have been done by utilizing Markov Chain Monte Carlo (MCMC) methods like Gibbs sampling [20]. The Bayesian structural time series (BSTS) models [21] are implemented with the recently built bsts package of R.
In response to the severe COVID-19 epidemic, global vaccine development efforts have been increased. Even with minimal protection against infection, vaccination can have a significant influence on preventing COVID-19 outbreaks in United states [22]. The evaluation of vaccination’s effects in the target countries could be highly fascinating. Intervention analysis can be used to investigate the causal impact of vaccination in these nations. Causal impact analysis employing bayesian structural time series models is accustomed to assessing the effect of the involvement in the post-involvement phase. These models, in contrast to traditional ARIMA models, give absolute effect and relative effect and performs better than conventional models due to their chronological method, as well as additional benefits such as past data and complex covariate structure [23].
The study’s first goal is to develop BSTS models for researching COVID-19’s future trends and compare their prediction power to that of the most commonly utilized ARIMA models. This study aims to look into the COVID-19’s temporal dynamics in the five countries affected worldwide. The United States, the United Kingdom, the United Arab Emirates, Bahrain, and India are among these countries. We also looked into the vaccination’s causal effects in these nations. We explored BSTS models and intervention analysis using bayesian structural time series models to attain this goal. When contrasted to ARIMA models, the outcomes showed a higher level of accuracy. The methods suggested can be used to examine these trends in any other country.
2 Methodology
Data from Our World in Data included the total number of positive cases, the cumulative number of deaths, and the cumulative number of vaccines given on the website named as Humanitarian Data Exchange. These authorities produce and update statistics for many nations on a regular schedule. The released information has since been accepted for examination, and the study no longer requires any permission from authorities.
A time series intervention analysis can be conducted using the BSTS designs. The discrepancy between the actual and predicted time series that had happend if the treatment had not occurred can be calculated using these methods. The following steps can be used to examine the causal effects of vaccination using these computations. The BSTS model is estimated in the first stage utilizing data up to the goal date (date of vaccination started, Here we consider February 15 as the date of vaccination started for five countries). An estimated model is used to forecast the vaccination period if the intervention is not used (without vaccination) in the next stage. Finally, during the vaccination period, To determine the causal influence of vaccination, the difference among expected and actual data is evaluated.
The BSTS models were used to generate forecasts for various pandemic parameters. The Bayesian technique is incorporated into these models. The likelihood function (current data) is blended with prior information (like a professional viewpoint) to upgrade the existing information and construct the finalized Bayesian models, known as posterior distributions, employing these models. These models employ Bayesian model averaging and Kalman filtering to generate more exact forecasts [16]. When using these models, closed-form estimators for model parameters are not achievable due to their complexity [21]. We used the R language to estimate the model parameters quantitatively using the Markov Chain Monte Carlo (MCMC) approach. The MCMC approaches use conditional distributions to draw random samples for the model parameters and then average the results to get the final estimations. Due to the complexity of these models, closed-form estimators for model parameters are not available when employing them [21]. We used the R language to estimate the model parameters quantitatively using the Markov Chain Monte Carlo (MCMC) sampling method. The MCMC method uses conditional distributions to draw samples at random for estimating BSTS parameters and then average the results to get the ultimate estimations. The Ljung Box test has been used to perform diagnostic checks on the models. Various forecast accuracy measurements, like root mean square error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAE), have been used to compare the forecast accuracy of BSTS models to the most often used ARIMA models. We have only given projections for different parameters of the pandemic using BSTS models because the enhanced forecast accuracy has been noticed. To do this, I treated the effect of vaccinations (in the aforementioned countries) as an intervention and conducted an intervention analysis using BSTS models. The numerical findings for the causal impacts were acquired using the R package CausalImpact.
3 Bayesian Structural Time-series Models
The time series is broken down into four factors in the basic structural model: a level, a local trend, seasonal impacts, and an error term. A pair of equations can be used to define a structural time series model [24] given by
The observation equation is the first (1), and the state equation is the second (2); it ties the observed data(\(y_{t}\)) to the state vector( \(\alpha _{t}\)). where \(\varepsilon _{t} \sim \mathcal {N}\left( 0, \sigma _{t}^{2}\right) \) and \(\eta _{t} \sim \mathcal {N}\left( 0, Q_{t}\right) \) are independent of all other unknowns and \(\varepsilon _{t}\) and \(\eta _{t}\) are observation error and system error respectively. The output vector, transition matrix, control matrix, and state-diffusion matrix are represented by \(Z_t\), \(T_t\), \(R_t\), and \(Q_t\), respectively.
3.1 Local Level Model
Local level model is the simplest Structural times series model. Local level assumes the trend is a random walk:
In the local level the matrices \(Z_{t}\), \(T_{t}\) and \(R_{t}\) in equation are collapsed to the scalar value ‘1’. Parameters of the model are variances of the error term \(\left( \sigma _{\epsilon }^{2}, \sigma _{\eta }^{2}\right) \).
3.2 Local Linear Trend Model
The local linear trend assumes that both the mean and slope follow random walks. The equation for the mean is as follows:
and the equation of the slope is:
Because it quickly adapts to local variability, the local linear trend model is a common choice for modeling trends. This is useful when making short-term forecasts. When making longer-term forecasts, this kind of flexibility may be undesirable, as such predictions frequently have implausibly large uncertainty intervals.
4 ARIMA Models
Three parameters determine the \({\text {ARIMA}}(p, d, q)\) model. The parameter p in the \({\text {AR}}(p)\) represents the current values depend on its own p-previous values and parameter q in the \({\text {MA}}(q)\) represents The current deviation from mean depends on q- previous deviations. And \(\mathrm {d}\) is the measure of difference. The \({\text {ARIMA}}(p, d, q)\) model has the form [25] :
where u is the time-shift operator \(u^{d} Y_{t}=Y_{t-d}\). The total number of reported daily Covid-19 cases was \(Y_t\), with the first difference \(\Delta Y_t=Y_t-Y_{t-l}\) representing the daily number of infections. The Akaike Information Criterion (AIC) was used to determine the final parameters for d and p, q. The open-source software R was used to perform all calculations.
5 Results and Discussions
We performed the research to find out COVID-19’s future behavior in the five nations afflicted by the virus, as well as the causative influence of vaccination in these countries. 30th of June, 2021, the United States of America had confirmed 33,664,970 cases, 604,714 fatalities, and 326,521,526 vaccinations. UK had 4,817,298 positive cases, 128,404 deaths, and 77,592,212 vaccinations. In terms of COVID-19 instances, UAE had 632,907 confirmed cases, 1,811 deaths, and 15,362,342 vaccinations. And Bahrain had 265827 confirmed cases, 1,352 fatalities, and 1,057,888 vaccinations. With 30,411,634 positive cases, 399,459 deaths, and 271,410,023 immunizations, India is facing a major outbreak. With these data in hand, We set out to evaluate the predicting accuracy of the recommended BSTS models to the much more often utilized ARIMA models. This contrast has been made using various forecasting accuracy measures like RMSE, MAE, and MAPE. The accuracy of predicting results comparison is shown in Table 1. Table 2 and Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 provide the projections for the following sixty days, including the projected number of cases, deaths, and total vaccinations. Finally, the causal effects of vaccines are summarised in Table 3 and Figs. 16, 17, 18, 19, 20, 21, 22, 23, 24, 25. Table 1 shows the contrast of BSTS and ARIMA models by measures of several prediction accuracy parameters. The BSTS models delivered more accurate forecasts than the ARIMA models, according to these findings. There are a few deviations, which could be related to the data’s unpredictable behavior. As a result, for projecting COVID-19 trends in these nations, BSTS models have proven to be a suitable option for ARIMA models. As a result, only the BSTS models’ forecasts have been presented in full.
Table 2 presents two-month projections for the five countries involved. On August 30, 2021, the cumulative number of positive cases, deaths, and population that has received at least one dose of vaccine in the United States was predicted to grow by 1.17 percent, 2.7 percent, and 12.5 percent, respectively. In the United States, 54 percent of the population has received at least one dose of the COVID-19 vaccine, and 46.61 percent is fully vaccinated as of June 30. Our projection is that nearly 66.5 percent of the population will have had at least one dose by August 30, and the lesser the number of deaths and cases in the United States. If people become fully vaccinated and take precautions, the coronavirus could be brought under control in the United States in the coming months. The predicted rise in the cumulative number of positive cases, deaths, and a population that has received at least one dose of vaccine in the UK is 19.4%, 3.5%, and 17%, respectively. In the United Kingdom, 66% of the population has received at least one dose of the COVID-19 vaccine, 48% is fully vaccinated as of June 30, and we expect approximately 83% to have had at least one dose by August 30. The increased vaccination rate aids the UK in controlling daily COVID-19 deaths, but due to a lack of sufficient measures, the UK continues to have a substantial number of daily cases.
Similarly, the cumulative number of positive cases, deaths, and vaccinations in the UAE is predicted to grow by 15.5%, 15.8%, and 25.2%, respectively. On June 30, 74% of the population in the UAE had received at least one dose of the COVID-19 vaccine, and 64% were fully vaccinated. These rapid vaccinations in the UAE keep daily deaths to a bare minimum and daily cases under control.
In Bahrain, the overall number of positive cases, deaths, and the population that has received at least one dose of immunizations are expected to increase by 13.8%, 9.4%, and 14%, respectively. As of June 30, 62.17 percent of the Bahrain population had received at least one dose of the COVID-19 vaccine, 58% had been fully vaccinated, and we predict that by August 30, about 76% of the population will have had at least one dose. If Bahrain accelerates its immunization program, it will be able to reduce COVID-19-related deaths and cases on a daily basis.
In India, the overall positive cases, deaths, and population that has received at least one dose of vaccines are expected to rise by 8%, 14.8%, and 15.3%, respectively. As of June 30, 15.48 percent of India’s population had gotten at least one dose of the COVID-19 vaccine, while 19.66 percent had been fully vaccinated, and our prediction is that by August 30, about 31% of the population will have had at least one dose.
The next stage was to look into the role of vaccines in the development of cumulative cases and deaths in the five countries concerned. It should be noted that the vaccination in the United Kingdom began on December 8, 2020, in the United States on December 14, 2020, in UAE on December 14, 2020, in Bahrain on December 22, 2020, and in India on 16, January 2021. The immunization date (February 15, 2021) was employed as the intervention in the BSTS models’ intervention analysis. We compared the current figures to what might have happened if these countries had not vaccinated their citizens. The validity of the findings was investigated utilizing posterior probabilities and the likelihood of causal impacts. The outcomes are presented in Table 3 and Figs. 16, 17, 18, 19, 20, 21, 22, 23, 24, 25. These impacts’ posterior odds of occurring as random events are far too low., as we can see. The chances of causal effects, on the other hand, are relatively high. This merely demonstrates the importance of immunizations’ causal effects in each of the five countries involved. The vaccine reduced the number of cases by 9.7 percent, 10 percent, and 12.3 percent in the United States, the United Kingdom, and the United Arab Emirates, respectively. Likewise, these countries saw a decline in death rates, with 17.9 percent, 7.7 percent, and 3.8 percent for the United States, the United Kingdom, and the United Arab Emirates, respectively. As a result, these countries may have benefited from a high immunization rate in their population. On the other hand, there is no evident impact of vaccination in terms of cases and deaths in Bahrain, but due to poor vaccination rates in India, just 15 percent of the population received the vaccine on June 30. Vaccination is not having a visible impact in India in terms of cases and deaths. The current estimates for the overall number of illnesses and deaths are significantly higher than those predicted during the vaccine period. As a result, India must improve the speed with which vaccines are administered so that people can return to their normal lives.
6 Conclusion
According to a literature survey, there hasn’t been any research into the separation of the components in relation to the changing behavior of the COVID-19 trends. The BSTS models disaggregate the COVID-19 trends through various components, which is an important study aspect. The proposed method also permits the coefficients to fluctuate over time, allowing for better detection of the data creation process. We showed that BSTS models could help with early preparation, prioritization, and distribution of healthcare resources to mitigate COVID-19 effects in the nations studied. Furthermore, the causal effects of vaccination have been studied. With a few exceptions, the study’s findings imply that the proposed models forecasting accuracy is superior to that of commonly used ARIMA models. Among these countries,The number of instances is predicted to increase by a percentage to be high in the UK. The % rise in the total number of cases will be ranked in the following order, UK> UAE > Bahrain> India >United States of America. Among these countries, In India, the rate of increase in the number of deaths is predicted to be significant. The % rise in the total number of deaths will be ranked in the following order, India> UAE > Bahrain> UK >USA. Our research also suggests that in the United States, To lower the number of cases and deaths, the United Kingdom and the United Arab Emirates have implemented a successful vaccine plan. On the other hand, India is still battling to manage the number of deaths due to tardy immunization or a large population. India may need to rethink their immunization strategies. By using effective and quick vaccination, the United States and the United Kingdom have reduced the number of mortality. However, the situation in India may become more problematic during the following sixty days. These findings, we believe, will assist these countries in efficiently prioritizing, devising, and implementing policies to prevent the pandemic’s expected consequences.
There are several limitations to this study as well. We presume that the information obtained is correct; nevertheless, because not all patients are admitted to clinics, and others are asymptomatic, the data may be underreported. No risk factors have been evaluated due to a lack of corresponding data. Despite the fact that BSTS models gave better projections than ARIMA models, the precision of these forecasts may be harmed by the data’s inherent uncertainty. However, the study’s goal is not to produce 100 percent accurate projections but rather to provide key signals to stakeholders so that they may organize their strategies accordingly.
Data and Code availability
Data link is provided in the manuscript, and Code will be provided as per the request
References
Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining, volume 10. McGraw-Hill/Irwin New York
Shi Y (2022) Advances in big data analytics: theory, algorithms and practices. Springer Nature
Tien JM (2017) Internet of things real-time decision making and artificial intelligence. Annals of Data Science. 4(2):149–178
Shi Y, Tian Y, Kou G, Peng Y, Li J (2011) Optimization based data mining: theory and applications. Springer Science & Business Media
Jian** L, Kun G, Herrera VE, Heesoek L, Jiming L, Ning Z, Monteiro GLFA, Gheorghe FF, Shu-Cherng F, Özdemir MS et al (2020) Culture versus policy: more global collaboration to effectively combat covid-19. The Innovation. 1(2):100023
Tomar A, Gupta N (2020) Prediction for the spread of covid-19 in india and effectiveness of preventive measures. Science of The Total Environment 728:138762
Liu Y, Gu Z, **a S, Shi B, Zhou X-N, Shi Y, Liu J (2020) What are the underlying transmission patterns of covid-19 outbreak? an age-specific social contact characterization. EClinicalMedicine 22:100354
Olisah CC, Ilori OO, Adelaja K, Usip PU, Uzoechi LO, Adeyanju IA, Odumuyiwa VT (2021) Data-driven approach to covid-19 infection forecast for nigeria using negative binomial regression model. In: Data Science for COVID-19, pages 583–596. Elsevier
Talkhi N, Fatemi NA, Ataei Z, Nooghabi MJ (2021) Modeling and forecasting number of confirmed and death caused covid-19 in iran A comparison of time series forecasting methods. Biomed. Signal Proc, Control, p 66
**e L (2021) The analysis and forecasting covid-19 cases in the united states using bayesian structural time series models. Biostatistics & Epidemiology, pages 1–15
Kumar S (2020) Monitoring novel corona virus (covid-19) infections in india by cluster analysis. Ann. Data Sci 7(3):417–425
Brockwell PJ, Brockwell PJ, Davis RA, Davis RA (2016) Introduction to time series and forecasting. Springer
Brodersen KH, Gallusser F, Koehler J, Remy N, Scott SL (2015) Inferring causal impact using bayesian structural time-series models. Ann. Appl. Stat. 9(1):247–274
Feroze N (2020) Forecasting the patterns of covid-19 and causal impacts of lockdown in top five affected countries using bayesian structural time series models. Chaos, Solitons & Fractals 140:110196
Harvey AC (1990) Forecasting, structural time series models and the kalman filter
Scott SL, Varian HR (2014) Predicting the present with bayesian structural time series. Int. J. Math. Model. Num. Optim. 5(1–2):4–23
De Vocht F, Tilling K, Pliakas T, Angus C, Egan M, Brennan A, Campbell R, Hickman M (2017) The intervention effect of local alcohol licensing policies on hospital admission and crime: a natural experiment using a novel bayesian synthetictime-series method. J Epidemiol Commun. Health 71(9):912–918
McQuire C, Tilling K, Hickman M, De Vocht F (2019) Forecasting the 2021 local burden of population alcohol-related harms using bayesian structural time-series. Addict. 114(6):994–1003
Poyser O (2019) Exploring the dynamics of bitcoin’s price: a bayesian structural time series approach. Eurasian Econ. Rev. 9(1):29–60
George EI, McCulloch RE (1997) Approaches for bayesian variable selection. Statistica sinica, pages 339–373
Feroze N (2021) Assessing the future progression of covid-19 in iran and its neighbors using bayesian models. Infect. Dis. Modelling 6:343–350
Moghadas SM, Vilches TN, Zhang K, Wells CR, Shoukat A, Singer BH, Meyers LA, Neuzil KM, Langley JM, Fitzpatrick MC, et al. (2021) The impact of vaccination on covid-19 outbreaks in the united states. medRxiv
Liu J, Spakowicz DJ, Ash GI, Hoyd R, Zhang A, Lou S, Lee D, Zhang J, Presley C, Greene A, et al. (2020b) Bayesian structural time series for biomedical sensor data: A flexible modeling framework for evaluating interventions. bioRxiv
Scott SL, Varian HR, et al. (2015) Bayesian variable selection for nowcasting economic time series. Economic analysis of the digital economy, 119
Box GEP, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. John Wiley & Sons
Funding
No funding was received for conducting the study.
Author information
Authors and Affiliations
Contributions
Each author has equal contribution. All authors jointly write, review and edit the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Ethical approval
Authors do not copied this work from any source and this work does not cause harm to human or society.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Navas Thorakkattle, M., Farhin, S. & khan, A.A. Forecasting the Trends of Covid-19 and Causal Impact of Vaccines Using Bayesian Structural time Series and ARIMA. Ann. Data. Sci. 9, 1025–1047 (2022). https://doi.org/10.1007/s40745-022-00418-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40745-022-00418-4