Abstract
It is often of primary interest to analyze and forecast the levels of a continuous phenomenon as a categorical variable. In this paper, we propose a new spatio-temporal model to deal with this problem in a binary setting, with an interesting application related to the COVID-19 pandemic, a phenomena that depends on both spatial proximity and temporal auto-correlation. Our model is defined through a hierarchical structure for the latent variable, which corresponds to the probit-link function. The mean of the latent variable in the proposed model is designed to capture the trend and the seasonal pattern as well as the lagged effects of relevant regressors. The covariance structure of the model is defined as an additive combination of a zero-mean spatio-temporally correlated process and a white noise process. The parameters associated with the space-time process enable us to analyze the effect of proximity of two points with respect to space or time and its influence on the overall process. For estimation and prediction, we adopt a complete Bayesian framework along with suitable prior specifications and utilize the concepts of Gibbs sampling. Using the county-level data from the state of New York, we show that the proposed methodology provides superior performance than benchmark techniques. We also use our model to devise a novel mechanism for predictive clustering which can be leveraged to develop localized policies.
Similar content being viewed by others
Data availability
Data used in the main analysis are extracted from the the COVID-19 GitHub repository maintained by the Center for Systems Science and Engineering at Johns Hopkins University (link: https://github.com/CSSEGISandData/COVID-19). The code for the spatio-temporal model for binary data discussed in this paper, along with the pre-processed and cleaned data, is available at the GitHub repository (link: https://github.com/anaghchattopadhyay/Spatio-temporal-model-for-binary-data), maintained by the first author.
References
Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)
Anastassopoulou, C., Russo, L., Tsakris, A., Siettos, C.: Data-based analysis, modelling and forecasting of the COVID-19 outbreak. PLoS ONE 15(3), e0230405 (2020)
Anselin, L.: Spatial Econometrics: Methods and Models, vol. 4. Springer Science & Business Media, Cham (1988)
Asahi, K., Undurraga, E.A., Valdés, R., Wagner, R.: The effect of COVID-19 on the economy: evidence from an early adopter of localized lockdowns. J. Glob. Health 11, 05002 (2021)
Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H.: Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B Stat Methodol. 70(4), 825–848 (2008)
Barría-Sandoval, C., Ferreira, G., Benz-Parra, K., López-Flores, P.: Prediction of confirmed cases of and deaths caused by COVID-19 in Chile through time series techniques: a comparative study. PLoS ONE 16(4), e0245414 (2021)
Beloconi, A., Probst-Hensch, N.M., Vounatsou, P.: Spatio-temporal modelling of changes in air pollution exposure associated to the COVID-19 lockdown measures across Europe. Sci. Total Environ. 787, 147607 (2021)
Berrett, C.: Bayesian Probit Regression Models for Spatially-Dependent Categorical Data. Ph. D. thesis, The Ohio State University (2010)
Bivand, R.: R packages for analyzing spatial data: a comparative case study with areal data. Geogr. Anal. 54(3), 488–518 (2022)
Chandra, R., Jain, A., Singh Chauhan, D.: Deep learning via LSTM models for COVID-19 infection forecasting in India. PLoS ONE 17(1), e0262708 (2022)
Chatterjee, S., Anton, J.M., Rosengart, T.K., Coselli, J.S.: Cardiac surgery during the COVID-19 sine wave: preparation once, preparation twice. A view from Houston. J. Cardiac. Surg. 36(5), 1615–1623 (2021)
Cheng, T., Zhao, Y., Zhao, C.: Exploring the spatio-temporal evolution of economic resilience in Chinese cities during the COVID-19 crisis. Sustain. Cities Soc. 84, 103997 (2022)
Chib, S.: Modeling and analysis for categorical response data. Handb. Stat. 25, 835–867 (2005)
Chowdhury, M.E.H., Rahman, T., Khandakar, A., Mazhar, R., Kadir, M.A., Mahbub, Z.B., Islam, K.R., Khan, M.S., Iqbal, A., Emadi, N.A., Reaz, M.B.I., Islam, M.T.: Can AI help in screening viral and COVID-19 pneumonia? IEEE Access 8, 132665–132676 (2020). https://doi.org/10.1109/ACCESS.2020.3010287
Christensen, O.F., Waagepetersen, R.: Bayesian prediction of spatial count data using generalized linear mixed models. Biometrics 58(2), 280–286 (2002)
Congdon, P.: Bayesian Models for Categorical Data. Wiley, Hoboken (2005)
Czado, C., Gneiting, T., Held, L.: Predictive model assessment for count data. Biometrics 65(4), 1254–1261 (2009)
Deb, S., Dey, D.: Spatial modeling of shot conversion in soccer to single out goalscoring ability. J. Sports Anal. 5(4), 281–297 (2019)
Diggle, P.J., Tawn, J.A., Moyeed, R.A.: Model-based geostatistics. J. R. Stat. Soc. Ser. C 47(3), 299–350 (1998)
Dixon, P.M.: Ripley’s K function. Encycl. Environ. 3, 1796–1803 (2002)
Dong, Z., Zhu, S., **e, Y., Mateu, J., Rodríguez-Cortés, F.J.: Non-stationary spatio-temporal point process modeling for high-resolution COVID-19 data. J. R. Stat. Soc. Ser. C Appl. Stat. 72(2), 368–386 (2023)
Dormann, C.F., McPherson, J.M., Araújo, M.B., Bivand, R., Bolliger, J., Carl, G., Davies, R.G., Hirzel, A., Jetz, W., Kissling, W.D., et al.: Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30(5), 609–628 (2007)
Faíco-Filho, K.S., Passarelli, V.C., Bellei, N.: Is higher viral load in SARS-CoV-2 associated with death? Am. J. Trop. Med. Hyg. 103(5), 2019 (2020)
Franzese, R.J., Hays, J.C., Cook, S.J.: Spatial-and spatiotemporal-autoregressive probit models of interdependent binary outcomes. Polit. Sci. Res. Methods 4(1), 151–173 (2016)
Fritz, C., Dorigatti, E., Rügamer, D.: Combining graph neural networks and spatio-temporal disease models to improve the prediction of weekly COVID-19 cases in Germany. Sci. Rep. 12(1), 3930 (2022)
Fuglstad, G.A., Simpson, D., Lindgren, F., Rue, H.: Constructing priors that penalize the complexity of Gaussian random fields. J. Am. Stat. Assoc. 114(525), 445–452 (2019)
Gao, M., Yang, H., **ao, Q., Goh, M.: COVID-19 lockdowns and air quality: evidence from grey spatiotemporal forecasts. Socioecon. Plann. Sci. 83, 101228 (2022)
Gayawan, E., Adjei, C.N.: Bayesian spatio-temporal analysis of breastfeeding practices in Ghana. GeoJournal 86(4), 1943–1955 (2021)
Gayawan, E., Awe, O.O., Oseni, B.M., Uzochukwu, I.C., Adekunle, A., Samuel, G., Eisen, D.P., Adegboye, O.A.: The spatio-temporal epidemic dynamics of COVID-19 outbreak in Africa. Epidemiol. Infect. 148, e212 (2020)
Gelman, A.: Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 1(3), 515–534 (2006)
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7(4), 457–472 (1992)
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. J. Appl. Stat. 20(5–6), 25–62 (1993)
Guadamuz, R., Aguero-Valverde, J.: Bayesian spatial models of injury severity at railway crossings. J. Transp. Saf. Sec. 13(6), 680–693 (2021)
Guhathakurata, S., Kundu, S., Chakraborty, A., Banerjee, J.S.: A novel approach to predict COVID-19 using support vector machine, Data Science for COVID-19, pp. 351–364. Elsevier (2021)
Guliyev, H.: Determining the spatial effects of COVID-19 using the spatial panel data model. Spat. Stat. 38, 100443 (2020)
Hardouin, C., Cressie, N.: Two-scale spatial models for binary data. Stat. Methods Appl. 27(1), 1–24 (2018)
Heaton, M.J.: Kernel averaged predictors for space and space-time processes. Ph. D. thesis, Duke University (2011)
Heneghan, C.J., Jefferson, T.: Why COVID-19 modelling of progression and prevention fails to translate to the real-world. Adv. Biol. Regul. 86, 100914 (2022)
Hyndman, R.J., Athanasopoulos, G.: Forecasting: principles and practice. OTexts. https://otexts.com/fpp3/ (2018)
Imtyaz, A., Haleem, A., Javaid, M.: Analysing governmental response to the COVID-19 pandemic. J. Oral Biol. Craniofac. Res. 10(4), 504–513 (2020)
Ioannidis, J.P., Cripps, S., Tanner, M.A.: Forecasting for COVID-19 has failed. Int. J. Forecast. 38(2), 423–438 (2022)
Johnson, D.: Spatial autocorrelation, spatial modeling, and improvements in grasshopper survey methodology. Can. Entomol. 121(7), 579–588 (1989)
Kammann, E., Wand, M.P.: Geoadditive models. J. Roy. Stat. Soc. Ser. C Appl. Stat. 52(1), 1–18 (2003)
Kaufman, L., Rousseeuw, P.J.: Partitioning Around Medoids (Program PAM), Chapter 2, In: Kaufman, L., Rousseeuw, P.J., (eds.) Finding Groups in Data. Wiley, pp. 68–125. https://doi.org/10.1002/9780470316801.ch2 (1990)
Kelejian, H.H., Prucha, I.R.: A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J. Real Estate Financ. Econ. 17, 99–121 (1998)
Kianfar, N., Mesgari, M.S., Mollalo, A., Kaveh, M.: Spatio-temporal modeling of COVID-19 prevalence and mortality using artificial neural network algorithms. Spat. Spat.-Tempor. Epidemiol. 40, 100471 (2022)
Klobucista, C.: By How Much Are Countries Underreporting COVID-19 Cases and Deaths? Council on Foreign Relations, 2021. JSTOR. http://www.jstor.org/stable/resrep33364. Accessed 7 July 2024 (2021)
Kolassa, S.: Evaluating predictive count data distributions in retail sales forecasting. Int. J. Forecast. 32(3), 788–803 (2016)
Lee, D.: A comparison of conditional autoregressive models used in Bayesian disease map**. Spat. Spat.-Temporal Epidemiol. 2(2), 79–89 (2011)
Lee, D.: CARBayes: an R package for Bayesian spatial modeling with conditional autoregressive priors. J. Stat. Softw. 55(13), 1–24 (2013)
Leroux, B.G., Lei,X., Breslow,N.: Estimation of Disease Rates in Small Areas: a New Mixed Model for Spatial Dependence. In Statistical Models in Epidemiology, the Environment, and Clinical Trials, pp. 179–191. Springer (2000).
Li, Y., Undurraga, E.A., Zubizarreta, J.R.: Effectiveness of localized lockdowns in the COVID-19 pandemic. Am. J. Epidemiol. 191(5), 812–824 (2022)
Lowe, R., Bailey, T.C., Stephenson, D.B., Graham, R.J., Coelho, C.A., Carvalho, M.S., Barcellos, C.: Spatio-temporal modelling of climate-sensitive disease risk: towards an early warning system for dengue in Brazil. Comput. Geosci. 37(3), 371–381 (2011)
Lütkepohl, H., Xu, F.: The role of the log transformation in forecasting economic variables. Empir. Econ. 42(3), 619–638 (2012)
Maranzano, P., Otto, P., Fassò, A.: Adaptive lasso estimation for functional hidden dynamic geostatistical models. Stoch. Env. Res. Risk Assess. 37(9), 3615–3637 (2023)
Martinetti, D., Geniaux, G.: Approximate likelihood estimation of spatial probit models. Reg. Sci. Urban Econ. 64, 30–45 (2017)
Mateu, J., Giraldo, R.: Geostatistical Functional Data Analysis. Wiley, Hoboken (2021)
McCullagh, P.: Generalized Linear Models. Routledge, UK (2019)
Meyer, D., Dimitriadou,E., Hornik,K., Weingessel,A., Leisch,F.: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-13,(2023)
Minniakhmetov, I., Dimitrakopoulos, R.: High-order data-driven spatial simulation of categorical variables. Math. Geosci. 54(1), 23–45 (2022)
Mira, A., Tierney, L.: Efficiency and convergence properties of slice samplers. Scand. J. Stat. 29(1), 1–12 (2002)
National Center for Immunization and Respiratory Diseases. Science Brief: Indicators for Monitoring COVID-19 Community Levels and Making Public Health Recommendations, CDC COVID-19 Science Briefs [Internet]. Centers for Disease Control and Prevention (US). Updated 2022 Aug 11(2022)
Nazia, N., Butt, Z.A., Bedard, M.L., Tang, W.C., Sehar, H., Law, J.: Methods used in the spatial and spatiotemporal analysis of COVID-19 epidemiology: a systematic review. Int. J. Environ. Res. Public Health 19(14), 8267 (2022)
Neal, R.M.: Slice sampling. Ann. Stat. 31(3), 705–767 (2003)
Nikparvar, B., Rahman, M.M., Hatami, F., Thill, J.C.: Spatio-temporal prediction of the COVID-19 pandemic in US counties: modeling with a deep LSTM neural network. Sci. Rep. 11(1), 21715 (2021)
Odagaki, T.: Self-organized wavy infection curve of COVID-19. Sci. Rep. 11(1), 1–7 (2021)
Paradinas, I., Conesa, D., López-Quílez, A., Bellido, J.M.: Spatio-temporal model structures with shared components for semi-continuous species distribution modelling. Spat. Stat. 22, 434–450 (2017)
Pathak, R., Williams, D.: Evaluating the comparative accuracy of COVID-19 mortality forecasts: an analysis of the first-wave mortality forecasts in the United States. Forecasting 4(4), 798–818 (2022)
Pu, M., Zhong, Y.: Rising concerns over agricultural production as COVID-19 spreads: lessons from China. Glob. Food Sec. 26, 100409 (2020)
Puhach, O., Meyer, B., Eckerle, I.: SARS-CoV-2 viral load and shedding kinetics. Nat. Rev. Microbiol. 21(3), 147–161 (2023)
Rawat, S., Deb, S.: A spatio-temporal statistical model to analyze COVID-19 spread in the USA. J. Appl. Stat. 50(11–12), 2310–2329 (2023)
Roberts, G.O., Rosenthal, J.S.: Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61(3), 643–660 (1999)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Sauerheber, R.: Characteristics of the Covid-19 Pandemic in the United States, 2020. Arch. Prevent. Med. 5(1), 058–063 (2020)
Schmidt, A.M., Nobre, W.S.: Conditional Autoregressive (CAT) Model, 1–11. Wiley StatsRef, Statistics Reference Online (2014)
Schubert, E., Rousseeuw,P.J.: Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. In: International conference on similarity search and applications, Springer, PP. 171–187, (2019).
Shafiq, A., Çolak, A.B., Sindhu, T.N., Lone, S.A., Alsubie, A., Jarad, F.: Comparative study of artificial neural network versus parametric method in COVID-19 data analysis. Results Phys. 38, 105613 (2022)
Smith, T.E., LeSage, J.P.: A Bayesian probit model with spatial dependencies. Emerald Group Publishing Limited, Spatial and spatiotemporal econometrics (2004)
Steinwart, I., Christmann, A.: Support Vector Machines. Springer Science & Business Media, Cham (2008)
Ter Braak, C.J.: A Markov Chain Monte Carlo version of the genetic algorithm differential evolution: easy Bayesian computing for real parameter spaces. Stat. Comput. 16(3), 239–249 (2006)
Tiefelsdorf, M., Griffith, D.A., Boots, B.: A variance-stabilizing coding scheme for spatial link matrices. Environ. Plan A 31(1), 165–180 (1999)
Wang, Y., Finazzi, F., Fassò, A.: D-STEM v2: a software for modeling functional spatio-temporal data. J. Stat. Softw. 99, 1–29 (2021)
Yang, R., Ren, F., Xu, W., Ma, X., Zhang, H., He, W.: China’s ecosystem service value in 1992–2018: pattern and anthropogenic driving factors detection using Bayesian spatiotemporal hierarchy model. J. Environ. Manag. 302, 114089 (2022)
Zhou, Y., Levy, J.I.: Factors influencing the spatial extent of mobile source air pollution impacts: a meta-analysis. BMC Public Health 7(1), 1–11 (2007)
Zhu, J., Huang, H.C., Wu, J.: Modeling spatial-temporal binary data using Markov random fields. J. Agric. Biol. Environ. Stat. 10(2), 212–225 (2005)
Zhu, S., Bukharin, A., **e, L., Santillana, M., Yang, S., **e, Y.: High-resolution spatio-temporal model for county-level COVID-19 activity in the US. ACM Trans. Manag. Inform. Syst. (TMIS) 12(4), 1–20 (2021)
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chattopadhyay, A., Deb, S. A spatio-temporal model for binary data and its application in analyzing the direction of COVID-19 spread. AStA Adv Stat Anal (2024). https://doi.org/10.1007/s10182-024-00507-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10182-024-00507-0