An Overview of ARMA-Like Models for Count and Binary Data

  • Chapter
  • First Online:
Trends and Challenges in Categorical Data Analysis

Abstract

A comprehensive overview of the literature on models for discrete valued time series is provided, with a special focus on count and binary data. ARMA-like models such as the BARMA, GARMA, M-GARMA, GLARMA and log-linear Poisson are illustrated in detail and critically compared. Methods for deriving the stochastic properties of specific models are delineated and likelihood-based inference is discussed. The review is concluded with two empirical applications. The first regards the analysis of the daily number of deaths from COVID-19 in Italy, under the assumption both of a Poisson and a negative binomial distribution for the data generating process. The second illustration analyses the binary series of signs of log-returns for the weekly closing prices of Johnson & Johnson with BARMA and Bernoulli GARMA and GLARMA models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 160.49
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 213.99
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 213.99
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmad, A., Francq, C.: Poisson QMLE of count time series models. J. Time Ser. Analy. 37, 291–314 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  2. Al-Osh, M.A., Alzaid, A.A.: First-order integer-valued autoregressive (INAR (1)) process. J. Time Ser. Analy. 8, 261–275 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  3. Alzaid, A.A., Al-Osh, M.: An integer-valued pth-order autoregressive structure (INAR (p)) process. J. Appl. Probab. 27, 314–324 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  4. Basawa, I.V., Prakasa Rao, B.L.S.: Statistical Inference for Stochastic Processes. Probability and Mathematical Statistics. Academic, Cambridge [Harcourt Brace Jovanovich, Publishers], London (1980)

    Google Scholar 

  5. Benjamin, M., Rigby, R., Stasinopoulos, D.M.: Generalized autoregressive moving average models. J. Amer. Stat. Assoc. 98(461), 214–223 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  6. Billingsley, P.: Probability and Measure, 3rd edn. Wiley, Hoboken (1995)

    MATH  Google Scholar 

  7. Bollerslev, T.: Generalized autoregressive conditional heteroskedasticity. J. Econ. 31(3), 307–327 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  8. Box, G.E.P., Jenkins, G.M.: Time Series Analysis: Forecasting and Control. Holden Day, San Francisco (1970)

    MATH  Google Scholar 

  9. Box, G.E.P., Jenkins, G.M.: Time Series Analysis: Forecasting and Control. Prentice-Hall, Hoboken (1976)

    MATH  Google Scholar 

  10. Breen, W., Glosten, L.R., Jagannathan, R.: Economic significance of predictable variations in stock index returns. J. Finance 44(5), 1177–1189 (1989)

    Article  Google Scholar 

  11. Brockwell, P.J., Davis, R.A.: Time Series: Theory and Methods. Springer Series in Statistics. Springer, Berlin (1991)

    Google Scholar 

  12. Christou, V., Fokianos, K.: Quasi-likelihood inference for negative binomial time series models. J. Time Ser. Analy. 35, 55–78 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  13. Christou, V., Fokianos, K.: On count time series prediction. J. Stat. Comput. Simul. 85(2), 357–373 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  14. Clark, N.J., Kaiser, M.S., Dixon, P.M.: A spatially correlated auto-regressive model for count data (2018). Preprint ar**v:1805.08323

    Google Scholar 

  15. Cox, D.R.: Statistical analysis of time series: some recent developments. Scand. J. Stat. 8, 93–115 (1981)

    MathSciNet  MATH  Google Scholar 

  16. Creal, D., Koopman, S.J., Lucas, A.: Generalized autoregressive score models with applications. J. Appl. Econ. 28(5), 777–795 (2013)

    Article  MathSciNet  Google Scholar 

  17. Czado, C., Gneiting, T., Held, L.: Predictive model assessment for count data. Biometrics 65(4), 1254–1261 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  18. Davis, R.A., Liu, H.: Theory and inference for a class of nonlinear models with application to time series of counts. Stat. Sinica 26(4), 1673–1707 (2016)

    MathSciNet  MATH  Google Scholar 

  19. Davis, R.A., Dunsmuir, W.T.M., Streett, S.B.: Observation-driven models for Poisson counts. Biometrika 90(4), 777–790 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  20. Davis, R.A., Dunsmuir, W.T.M., Streett, S.B.: Maximum likelihood estimation for an observation driven model for Poisson counts. Methodol. Comput. Appl. Probab. 7(2), 149–159 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  21. Davis, R.A., Holan, S.H., Lund, R., Ravishanker, N.: Handbook of Discrete-valued Time Series. CRC Press, Boca Raton (2016)

    Book  MATH  Google Scholar 

  22. Diaconis, P., Freedman, D.: Iterated random functions. SIAM Rev. 41(1), 45–76 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  23. Douc, R., Doukhan, P., Moulines, E.: Ergodicity of observation-driven time series models and consistency of the maximum likelihood estimator. Stoch. Process. Appl. 123(7), 2620–2647 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  24. Douc, R., Fokianos, K., Moulines, E.: Asymptotic properties of quasi-maximum likelihood estimators in observation-driven time series models. Electron. J. Stat. 11(2), 2707–2740 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  25. Dunsmuir, W., Scott, D.: The GLARMA package for observation-driven time series regression of counts. J. Stat. Softw. 67(7), 1–36 (2015)

    Article  Google Scholar 

  26. Engle, R.F.: Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50(4), 987–1007, 06 (1982)

    Google Scholar 

  27. Ferland, R., Latour, A., Oraichi, D.: Integer-valued GARCH process. J. Time Ser. Analy. 27, 923–942 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  28. Fokianos, K., Tjøstheim, D.: Log-linear Poisson autoregression. J. Multivar. Analy. 102, 563–578 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  29. Fokianos, K., Kedem, B., et al.: Regression theory for categorical time series. Stat. Sci. 18(3), 357–376 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  30. Fokianos, K., Rahbek, A., Tjøstheim, D.: Poisson autoregression. J. Amer. Stat. Assoc. 104, 1430–1439 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  31. Fokianos, K., Støve, B., Tjøstheim, D., Doukhan, P.: Multivariate count autoregression. Bernoulli 26(1), 471–499 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  32. Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Amer. Stat. Assoc. 102(477), 359–378 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  33. Gneiting, T., Balabdaoui, F., Raftery, A.E.: Probabilistic forecasts, calibration and sharpness. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 69(2), 243–268 (2007)

    Google Scholar 

  34. Gorgi, P.: Beta–negative binomial auto-regressions for modelling integer-valued time series with extreme observations. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 82, 1325–1347 (2020)

    Google Scholar 

  35. Hairer, M., Mattingly, J.C.: Ergodicity of the 2D Navier-Stokes equations with degenerate stochastic forcing. Ann. Math. 164(3), 993–1032 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  36. Harvey, A.C.: Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series. Cambridge University Press, Cambridge (2013)

    Book  MATH  Google Scholar 

  37. Hayashi, F.: Econometrics. Princeton University Press, Princeton (2000)

    MATH  Google Scholar 

  38. Heyde, C.C.: A general approach to optimal parameter estimation. In: Quasi-Likelihood and Its Application. Springer Series in Statistics. Springer, New York (1997)

    Google Scholar 

  39. Ho, S.-L., **e, M., Goh, T.N.: A comparative study of neural network and Box-Jenkins ARIMA modeling in time series prediction. Comput. Ind. Eng. 42(2–4), 371–375 (2002)

    Article  Google Scholar 

  40. Kauppi, H., Saikkonen, P.: Predicting U.S. recessions with dynamic binary response models. Rev. Econ. Stat. 90(4), 777–791 (2008)

    Google Scholar 

  41. Li, W.K.: Time series models based on generalized linear models: some further results. Biometrics 50(2), 506–511 (1994)

    Article  MATH  Google Scholar 

  42. Matteson, D.S., Woodard, D.B., Henderson, S.G.: Stationarity of generalized autoregressive moving average models. Electron. J. Stat. 5, 800–828 (2011)

    MathSciNet  MATH  Google Scholar 

  43. McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman & Hall, Boca Raton (1989)

    Book  MATH  Google Scholar 

  44. Meyn, S., Tweedie, R.L., Glynn, P.W.: Markov Chains and Stochastic Stability, 2nd edn. Cambridge University Press, Cambridge (2009)

    Book  MATH  Google Scholar 

  45. Moysiadis, T., Fokianos, K.: On binary and categorical time series models with feedback. J. Multivar. Anal. 131, 209–228 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  46. Neumann, M.H.: Absolute regularity and ergodicity of Poisson count processes. Bernoulli 17(4), 1268–1284 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  47. Rydberg, T.H., Shephard, N.: Dynamics of trade-by-trade price movements: decomposition and models. J. Financ. Econ. 1(1), 2–25 (2003)

    Google Scholar 

  48. Scotto, M.G., Weiß, C.H., Gouveia, S.: Thinning-based models in the analysis of integer-valued time series: a review. Stat. Modell. 15(6), 590–618 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  49. Sen, P., Roy, M., Pal, P.: Application of ARIMA for forecasting energy consumption and GHG emission: a case study of an Indian pig iron manufacturing organization. Energy 116, 1031–1038 (2016)

    Article  Google Scholar 

  50. Startz, R.: Binomial autoregressive moving average models with an application to U.S. recessions. J. Business Econ. Stat. 26(1), 1–8 (2008)

    Google Scholar 

  51. Steutel, F.W., van Harn, K.: Discrete analogues of self-decomposability and stability. Ann. Probab. 7(5), 893–899 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  52. Taylor, S.: Modeling Financial Time Series. Wiley, Hoboken (1986)

    MATH  Google Scholar 

  53. Teräsvirta, T.: Specification, estimation, and evaluation of smooth transition autoregressive models. J. Amer. Stat. Assoc. 89(425), 208–218 (1994)

    MATH  Google Scholar 

  54. Tong, H., Lim, K.S.: Threshold autoregression, limit cycles and cyclical data-with discussion. J. Roy. Stat. Soc. Ser. B Stat. Methodol. 42(3), 245–292 (1980)

    MATH  Google Scholar 

  55. Tweedie, R.L.: Invariant measures for Markov chains with no irreducibility assumptions. J. Appl. Probab. 25(A), 275–285 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  56. Walker, G.T.: On periodicity in series of related terms. Proc. R. Soc. Lond. Ser. A Contain. Papers Math. Phys. Char. 131(818), 518–532 (1931)

    MATH  Google Scholar 

  57. Wang, Y., Wang, J., Zhao, G., Dong, Y.: Application of residual modification approach in seasonal ARIMA for electricity demand forecasting: a case study of China. Energy Policy 48, 284–294 (2012)

    Article  Google Scholar 

  58. Weiß, C.H., Feld, M.H.-J.M., Khan, N.M., Sunecher, Y.: Inarma modeling of count time series. Stats 2(2), 284–320 (2019)

    Article  Google Scholar 

  59. Yule, G.U.: On a method of investigating periodicities disturbed series, with special reference to Wolfer’s sunspot numbers. Philos. Trans. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Char. 226(636–646), 267–298 (1927)

    MATH  Google Scholar 

  60. Zeger, S.L.: A regression model for time series of counts. Biometrika 75, 621–629 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  61. Zeger, S.L., Liang, K.-Y.: Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42, 121–130 (1986)

    Article  Google Scholar 

  62. Zheng, T., **ao, H., Chen, R.: Generalized ARMA models with martingale difference errors. J. Econ. 189(2), 492–506 (2015)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Monia Lupparelli .

Editor information

Editors and Affiliations

Appendices

Appendix

8.1.1 Technical Details

8.1.1.1 Markov Chain Specification

In order to derive strict stationarity and ergodicity conditions, the problem is reformulated in terms of Markov chain theory. Let us consider an observation driven model in the most general form:

$$\displaystyle \begin{aligned} Y_{t}\left|\right. \mathcal{F}_{t-1} \sim q(\cdot; \mu_{t}) {} \end{aligned} $$
(8.23)
$$\displaystyle \begin{aligned} \mu_{t}=c_{\delta}(Y_{0:t-1}) {} \end{aligned} $$
(8.24)

where we adopt the shorthand notation Yt for the process and, as before, yt its realization. The function q is simply the density function which comes from (8.1), whereas cδ is some function which describes the form of the dependence from the observation. In general, Ys:t = (Ys, Ys+1, …, Yt) where s ≤ t. The symbol δ is the vector of parameters of the model. Of course, the initial values μ0:p−1 are supposed to be known. The model in (8.24) can be rewritten as:

$$\displaystyle \begin{aligned} \mu_{t}=g_{\delta}(Y_{t-p:t-1},\mu_{t-p:t-1}). \end{aligned}$$

This way of writing the observation driven model [15] gives a Markov p-structure for μt and then implies that the vector μtp:t−1 forms the state of a Markov chain indexed by t. In this case it is possible to prove stationarity and ergodicity of {Yt} t ∈ℕ by first showing these properties for the multivariate Markov chain {μtp:t−1} tp, then shifting the results back to the time series model {Yt} t ∈ℕ.

Some useful definitions for theorems based upon the theory of Markov chains asserted throughout the paper are introduced. Define a general Markov chain X = {Xt} t ∈ℕ on a state space S with σ-algebra ℱ and define Pt(x, A) = P(Xt ∈ A |X0 = x) for A ∈ℱ as the t-step transition probability starting from state X0 = x.

Definition 8.1

A Markov chain X is φ-irreducible if there exists a non-trivial measure φ on ℱ such that, whenever φ(A) > 0, Pt(x, A) > 0 for some t = t(x, A), for all x ∈ S.

Also, the definition of aperiodicity as stated in [44] is needed. Define a period d(α) = gcd {t ≥ 1 : Pt(α, α) > 0}.

Definition 8.2

An irreducible Markov chain X is aperiodic if d(x) ≡ 1, x ∈ X.

Definition 8.3

A set A ∈ℱ is called a small set if there exists an m > 1, a non-trivial measure v on ℱ, and a λ > 0 such that for all x ∈ A and all C ∈ℱ, Pm(x, C) ≥ λv(C).

Let Ex(⋅) denote the expectation under the probability Px(⋅) induced on the path space of the chain defined by Ω =∏ t=0∞Xt with respect to ℱ =∨ t=0ℬ(Xt) when the initial state X0 = x; where ℬ(Xt) is the Borel σ-field on Xt.

Theorem 8.7 (Drift Conditions)

Suppose that X = {Xt}tis φ-irreducible on S. Let A  S be small, and suppose that there exist b ∈ (0, ), 𝜖 > 0, and a function V : S  [0, ∞) such that for all x  S,

$$\displaystyle \begin{aligned} \mathrm{E}_{x}\left[ V(X_{1})\right] \leq V(x) - \varepsilon + b\boldsymbol{1}_{\left\lbrace x\in A\right\rbrace }, {} \end{aligned} $$
(8.25)

then X is positive Harris recurrent.

The function V  is called the Lyapunov function or energy function.

Positive Harris recurrent chains possess a unique stationary probability distribution π. Moreover, if X0 is distributed according to π, then the chain X is a stationary process. If the chain is also aperiodic, then X is ergodic, in which case if the chain is initialized according to some other distribution, then the distribution of Xt will converge to π as t →.

A stronger form of ergodicity, called geometric ergodicity, arises if (8.25) is replaced by the condition

$$\displaystyle \begin{aligned} \mathrm{E}_{x}\left[ V(X_{1})\right] \leq \beta V(x) + b\boldsymbol{1}_{\left\lbrace x\in A\right\rbrace } {} \end{aligned} $$
(8.26)

for some β ∈ (0, 1) and some V : S → [1, ). Indeed, (8.26) implies (8.25). Eventually, stationarity and ergodicity for the GARMA model would be accomplished if at least one of the sufficient condition (8.25), (8.26) above is fulfilled.

Unfortunately, a problem can occur when the distribution in (8.23) is not continuous (that is, Bernoulli, Poisson,…). In fact, in these cases the Markov chain {μtp:t−1} np may not be φ-irreducible. This occurs whenever Yt can only take a countable set of values and the state space μtp:t−1 is ℝp. Then, given a particular initial vector μ0:p−1, the set of possible values for μt is countable. Definition 8.1 is not satisfied. For this reason, additional theoretical tools are required:

  • Perturbation approach

  • Feller conditions.

8.1.1.2 Perturbation Approach

First, define the perturbed form of an observation driven time series model:

$$\displaystyle \begin{aligned} Y_{t}^{(m)}\left|\right. Y^{(m)}_{0:t-1} \sim q(\cdot; \mu^{(m)}_{t}) {} \end{aligned} $$
(8.27)
$$\displaystyle \begin{aligned} \mu^{(m)}_{t}=g_{\delta, t}(Y^{(m)}_{0:t-1}, m Z_{0:t-1}), {} \end{aligned} $$
(8.28)

where Zt ∼ ϕ are independent, identically distributed random perturbations having density function ϕ, m > 0 is a scale factor associated with the perturbation, and gδ,t(⋅, mZ0:t−1) is a continuous function of Z0:t−1 such that gδ,t(y, 0) = gδ,t(y) for any y. The value μ0(m) is a fixed constant that is taken to be independent of m, so that μ0(m) = μ0. The perturbed model is constructed to be φ-irreducible, so that one can apply usual drift conditions to prove its stationarity.

Then, it can be proved that the likelihood of the parameter vector δ calculated using (8.28) converges uniformly to the likelihood calculated using the unperturbed model as m → 0. More precisely, the joint density of the observations Y = Y 0:t(m) and first t perturbations Z = Z0:t−1, conditional on the parameter vector δ, the perturbation scale m, and the initial value μ0, is:

$$\displaystyle \begin{aligned} \begin{array}{rcl} & f(Y,Z\left| \right. \delta, m, \mu_{0})&\displaystyle = f(Z\left| \right. \delta, m, \mu_{0}) \times f(Y\left| \right. Z, \delta, m, \mu_{0}) \\ & &\displaystyle = \left[ \prod_{k=0}^{t-1} \phi(Z_{k}) \right]\prod_{k=0}^{t}f\left( Y^{(m)}_{k}; \mu_{k}(m Z)\right) \end{array} \end{aligned} $$

where μk(mZ) is the value of μk(m) induced by the perturbation vector mZ through (8.28), with μ0(mZ) = μ0. The likelihood function for the parameter vector δ implied by the perturbed model is the marginal density of Y  integrating over Z, i.e.,

$$\displaystyle \begin{aligned} \mathcal{L}_{m}(\delta)=f(Y\left| \right. \delta, m, \mu_{0})=\int f(Y,Z\left| \right. \delta, m, \mu_{0})dZ. \end{aligned}$$

Let the likelihood function without the perturbations be denoted by ℒ, so that

$$\displaystyle \begin{aligned} \mathcal{L}(\delta)=\prod_{k=0}^{t}f\left( Y^{(m)}_{k}; \mu_{k}(0)\right). \end{aligned}$$

Theorem 8.8

Under regularity conditions 1 and 2 below, the likelihood function ℒmbased on the perturbed model (8.27)(8.28) converges uniformly on any compact set K to the likelihood function ℒ based on the original model, i.e.,

for any fixed sequence of observations y0:tand conditional on the initial value μ0.

So if ℒ is continuous in δ and has a finite number of local maxima and a unique global maximum on K, the maximum-likelihood estimate of δ based on ℒm converges to that based on ℒ. The proof is in [42]. Regularity Conditions:

  1. 1.

    For any fixed y the function q(y;μ) is bounded and Lipschitz continuous in μ, uniformly in δ ∈ K.

  2. 2.

    For each t, μt(mZ) is Lipschitz in some bounded neighbourhood of zero, uniformly in δ ∈ K.

Regularity condition 1 holds, e.g., for q(y;μ) equal to a Poisson or binomial density with mean μ, or a negative binomial density with mean μ and precision parameter φ. μt(mZ) can easily be constructed to satisfy condition 2. One can choose to use the perturbed model (with fixed and sufficiently small perturbation scale m) instead of the original model, without significantly affecting finite-sample parameter estimates, in order to get the strong theoretical properties associated with stationarity and ergodicity.

Although, it has been shown that the perturbed and original models are closely related, and although one can use drift conditions to show the stationarity and ergodicity properties of the perturbed model, this approach does not yield stationarity and ergodicity properties for the original model. In fact, this approach addresses consistency of parameter estimation for the perturbed model when t → for fixed m and then shows that as m → 0 the finite sample estimates (for a fixed number of observations t) of the perturbed model approach those of the original one. In order to show real proprieties of the original model one should consider both limits t → together with m → 0 in which a substantial technical difficulty associated with interchanging the limits arises. For this reason, the Feller properties introduced in the next section are needed.

8.1.1.3 Feller Conditions

To deal with the lack of the φ-irreducibility condition, the Feller properties can be used instead.

Definition 8.4

A chain evolving on a complete separable metric space S is said to be “weak Feller” if P(x, ⋅) satisfies P(x, ⋅) ⇒ P(y, ⋅) as x → y, for any y ∈ S and where ⇒ indicates convergence in distribution.

In the absence of φ-irreducibility, the “weak Feller” condition can be combined with a drift condition (8.25) or (8.26) to show the existence of a stationary distribution [55]:

Theorem 8.9

Suppose that S is a locally compact complete separable metric space with ℱ the Borel σ-field on S, and the Markov chain {Xt}twith transition kernel P is weak Feller. Let A ℱ be compact, and suppose that there exist b  (0, ∞), ε > 0, and a function V : S  [0, ∞) such that for all x  S, the drift condition (8.25) holds. Then there exists a stationary distribution for P.

Uniqueness of the stationary distribution can be established using the “asymptotic strong Feller” property, defined in [35]. Before doing it, further definitions are required:

Definition 8.5

Let S be a Polish (complete, separable, metrizable) space. A “totally separating system of metrics” {dt} t ∈ℕ for S is a set of metrics such that for any x, y ∈ S with x ≠ y, the value dt(x, y) is non-decreasing in t and limtdt(x, y) = 1.

Definition 8.6

A metric d on S implies the following distance between probability measures μ1 and μ2:

$$\displaystyle \begin{aligned} \left\| \mu_{1} - \mu_{2} \right\|{}_{d} = \sup_{\mathrm{Lip}_{d}\phi=1}\left( \int\phi(x)\mu_{1}(dx) - \int\phi(x)\mu_{2}(dx)\right) {} \end{aligned} $$
(8.29)

where

$$\displaystyle \begin{aligned} \mathrm{Lip}_{d}\phi= \sup_{x,y\in S:x\neq y} \frac{\left|\phi(x)-\phi(y)\right|}{d(x,y)} \end{aligned}$$

is the minimal Lipschitz constant for ϕ with respect to d.

Definition 8.7

A chain is “asymptotically strong Feller” if, for every fixed x ∈ S, there is a totally separating system of metric {dt}  for S and a sequence tn > 0 such that

$$\displaystyle \begin{aligned} \lim_{\delta \to \infty} \limsup_{t \to \infty} \sup_{y\in B(x,\delta)} \left\| P^{t_{n}}(x,\cdot) - P^{t_{n}}(y,\cdot) \right\|{}_{d_{t}} = 0 \end{aligned}$$

where B(x, δ) is the open ball of radius δ centred at x, as measured using some metric defining the topology of S.

Definition 8.8

A “reachable” point x ∈ S means that for all open sets A containing x, ∑ t=1∞Pt(y, A) > 0 for all y ∈ S.

Theorem 8.10

Suppose that S is a Polish space and the Markov chain {Xt}twith transition kernel P is asymptotically strong Feller. If there is a reachable point x  S then P can have at most one stationary distribution.

This is an extension of [35]. The results of this section lay the foundation for showing the convergence and asymptotic properties of maximum likelihood estimators for the discrete-valued observation driven models.

8.1.1.4 Coupling Construction

Introduce a kernel H̄ from (X2, X⊗2) to (Y2, Y⊗2) satisfying the following conditions on the marginals: for all (x, x) ∈X2 and A ∈Y,

$$\displaystyle \begin{aligned} \bar{H}((x,x^\prime);A \times \mathsf{Y})=H(x,A), \quad \bar{H}((x,x^\prime); \mathsf{Y} \times A)=H(x^\prime,A). {} \end{aligned} $$
(8.30)

Let C ∈Y⊗2 such that H̄((x, x);C) ≠ 0 and the chain {Zt = (Xt, Xt′, Ut)}t ∈ℤ on the“extended” space (X2 × 0, 1, X⊗2 ⊗P(0, 1)) with transition kernel Q̄ implicitly defined as follows. Given Zt = (x, x, u) ∈X2 ×{0, 1}, draw (Yt+1, Yt+1) according to H̄((x, x);⋅) and set

$$\displaystyle \begin{aligned} X_{t+1}=f_{Y_{t+1}}(x), \quad X^{\prime}_{t+1}=f_{Y^{\prime}_{t+1}}(x^\prime), \end{aligned}$$
$$\displaystyle \begin{aligned} U_{t+1}={\mathbf{1}}_C(Y_{t+1},Y^{\prime}_{t+1}), \end{aligned}$$
$$\displaystyle \begin{aligned} Z_{t+1}=(X_{t+1},X^{\prime}_{t+1},U_{t+1}). \end{aligned}$$

The conditions on the marginals of H̄, given by (8.30) also imply conditions on the marginals of Q̄: for all A ∈X and z = (x, x, u) ∈X2 ×{0, 1},

$$\displaystyle \begin{aligned} \bar{Q}(z; A \times \mathsf{X} \times \left\lbrace 0,1\right\rbrace)=Q(x,A), \quad \bar{Q}(z;\mathsf{X} \times A \times \left\lbrace 0,1\right\rbrace)=Q(x^\prime,A). {} \end{aligned} $$
(8.31)

For z = (x, x, u) ∈X2 ×{0, 1}, write

$$\displaystyle \begin{aligned} \alpha(x,x^\prime)=\bar{Q}(z;\mathsf{X}^2 \times \left\lbrace 1\right\rbrace )=\bar{H}((x,x^\prime);C)\neq0. {} \end{aligned} $$
(8.32)

The quantity α(x, x) is thus the probability of the event {U1 = 1} conditionally on Z0, taken on Z0 = z. Denote by Q the kernel on (X2, X⊗2) defined by: for all z = (x, x, u) ∈X2 ×{0, 1} and A ∈X⊗2,

$$\displaystyle \begin{aligned} Q^\sharp((x,x^\prime);A)=\frac{\bar{Q}(z;A \times \left\lbrace 1\right\rbrace )}{\bar{Q}(z;\mathsf{X}^2 \times \left\lbrace 1\right\rbrace )} \end{aligned}$$

so that using (8.32),

$$\displaystyle \begin{aligned} \bar{Q}(z;A \times \left\lbrace 1\right\rbrace)=\alpha(x,x^\prime)\,Q^\sharp((x,x^\prime);A) . {} \end{aligned} $$
(8.33)

This shows that Q((x, x);⋅) is the distribution of (X1, X1) conditionally on (X0, X0, U1) = (x, x, 1).

8.1.1.5 Assumptions and Results of the Alternative Markov Chain Approach Without Irreducibility

In what follows, if (E, ℰ) is a measurable space, ξ a probability distribution on (E, ℰ), and R a Markov kernel on (E, ℰ), denote by PξR the probability induced on (E, ℰ⊗ℕ) by a Markov chain with transition kernel R and initial distribution ξ. Denote by EξR the associated expectation. Consider the following assumptions.

  1. (A1)

    The Markov kernel Q is weak Feller. Moreover, there exist a compact set C ∈X,(b, ε) ∈ℝ∗+ ×ℝ∗+ and a function V : X →ℝ+ such that

    $$\displaystyle \begin{aligned} QV\leq V-\varepsilon+b{\mathbf{1}}_C . \end{aligned}$$
  2. (A2)

    The Markov kernel Q has a reachable point.

  3. (A3)

    There exists a kernel Q̄ on (X2 ×{0, 1}, X⊗2 ⊗P({0, 1})), a kernel Q on (X2, X⊗2), and a measurable function α : X2 → [1, ), and real numbers (D, ζ1, ζ2, ρ) ∈ (ℝ+)3 ×(0, 1) such that for all (x, x) ∈X2,

    $$\displaystyle \begin{aligned} 1-\alpha(x,x^\prime)\leq d(x,x^\prime)W(x,x^\prime) {} \end{aligned} $$
    (8.34)
    $$\displaystyle \begin{aligned} \mathrm{E}^{Q^\sharp}_{\delta_x\otimes\delta_{x^\prime}}[d(X_t,X^{\prime}_t)] \leq D\rho^td(x,x^\prime) {} \end{aligned} $$
    (8.35)
    $$\displaystyle \begin{aligned} \mathrm{E}^{Q^\sharp}_{\delta_x\otimes\delta_{x^\prime}}[d(X_t,X^{\prime}_t)W(X_t,X^{\prime}_t)] \leq D\rho^td^{\zeta_1}(x,x^\prime)W^{\zeta_2}(x,x^\prime) . {} \end{aligned} $$
    (8.36)

    Moreover, for all x ∈X, there exists γx > 0 such that

    $$\displaystyle \begin{aligned} \sup_{x^\prime\in B(x,\gamma_x)}W(x,x^\prime)<\infty \end{aligned}$$

Some practical conditions from checking (8.35) and (8.36) in (A3) can be denoted.

Lemma 8.1

Assume that either (i) or (ii) or (iii) (defined below) holds.

  1. (i)

    There exist (ρ, β) ∈ (0, 1) × R such that for all (x, x) ∈X2

    $$\displaystyle \begin{aligned} d(X_1,X^\prime_1)\leq\rho d(x,x^\prime),\quad \mathrm{P}^{Q^\sharp}_{\delta_x\otimes\delta_{x^\prime}}-a.s. {} \end{aligned} $$
    (8.37)
    $$\displaystyle \begin{aligned} Q^\sharp W\leq W+\beta {} \end{aligned} $$
    (8.38)
  2. (ii)

    Equation (8.35) holds and W is bounded.

  3. (iii)

    Equation (8.35) holds and there exist 0 < α < αand β +such that for all (x, x) ∈X2

    $$\displaystyle \begin{aligned} d(x,x^\prime)\leq W^\alpha(x,x^\prime) \end{aligned}$$
    $$\displaystyle \begin{aligned} Q^\sharp W^{1+\alpha^\prime}\leq W^{1+\alpha^\prime}+\beta \end{aligned}$$

Then, (8.35) and (8.36) hold.

Assumption (A1) implies, by Tweedie [55], that the Markov kernel Q admits at least one stationary distribution. Assumptions (A2)–(A3) are then used to show that this stationary distribution is unique.

Note that assumptions (A1)–(A2) are the same as those of Theorems 8.9 and 8.10 of Appendix Section “Feller Conditions” and they can be proved for each observation driven model as has been done for the GARMA model in Sect. 8.5.1; assumption (A3) weakens the Lipschitz condition (8.18) by introducing a function W in (8.34). This allows to treat models which do not satisfy the Lipschitz condition (8.18); for example the log-linear Poisson autoregression of [28], see Sect. 8.5.2.

Theorem 8.11

Assume that (A1) (A3) hold. Then, the Markov kernel Q in (8.19) admits a unique invariant probability measure.

Proposition 8.1

Assume that the Markov kernel Q admits a unique invariant probability measure. Then, there exists a strict-sense stationary ergodic process on ℤ, {Yt}t, the solution to the recursion (8.19).

These results can be found in [23].

Main Proofs

8.1.1 Proof of Theorem 8.2

Following [42], Theorem 8.9 is applied to the chain {g(μt)}t ∈ℕ to show that it has a stationary distribution; this implies the same result for the chain {μt} t ∈ℕ. The state space S = ℝ of {g(μt)}t ∈ℕ is a locally compact complete separable metric space with Borel σ-field. A drift condition for {g(μt)}t ∈ℕ is given under the conditions of Theorem 8.1, for the compact set A = [−M, M] (the drift condition holds when the perturbation m = 0). All that remains is to show that the chain {g(μt)}t ∈ℕ is weak Feller. Let Xt = g(μt). For X0 = x, the GARMA model can be rewritten as

$$\displaystyle \begin{aligned} X_1(x)=\gamma+\phi(g(Y_0^*(g^{-1}(x)))-\gamma)+\theta(g(Y_0^*(g^{-1}(x)))-x). \end{aligned}$$

Since g−1 is continuous, Y0(g−1(x)) ⇒ Y0(g−1(x)) as x → x. Since the that maps Y0 to the domain of g is continuous, it follows that Y 0∗(g−1(x)) ⇒ Y 0∗(g−1(x)) as x → x. Since g is continuous, then g(Y 0∗(g−1(x))) ⇒ g(Y 0∗(g−1(x))). So X1(x) ⇒ X1(x) as x → x, showing the weak Feller property. This ends the proof.

8.1.2 Proof of Theorem 8.4

The proof is based on the results of Appendix Sections “Coupling Construction”–“Assumptions and Results of the Alternative Markov Chain Approach Without Irreducibility”. The conditions (A1)–(A2) for the log-linear Poisson autoregression are proved as in Sect. 8.5.1 for the GARMA model. We report the proof of (A3).

Lemma 8.2

If |a + b|∨|a|∨|b| < 1, then (A3) holds.

Proof

Define Q̄ as the transition kernel Markov chain {Zt} t ∈ℤ with Zt = (Xt, Xt′, Ut) in the following way. Given Zt = (x, x, u), if x ≤ x, draw independently Yt+1 ∼P(ex) and Vt+1 ∼P(ex− ex) and set Yt+1 = Yt+1 + Vt+1. Otherwise, draw independently Yt+1∼P(ex) and Vt+1 ∼P(ex − ex) and set Yt+1 = Yt+1 + Vt+1.

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle X_{t+1}=d+a\,x+b\ln(Y_{t+1}+1),\\ & &\displaystyle X_{t+1}^\prime=d+a\,x^\prime+b\ln(Y_{t+1}^\prime+1),\\ & &\displaystyle U_{t+1}={\mathbf{1}}_{Y_{t+1}=Y_{t+1}^\prime}={\mathbf{1}}_{V_{t+1}=0},\\ & &\displaystyle Z_{t+1}=(X_{t+1},X_{t+1}^\prime,U_{t+1}) \end{array} \end{aligned} $$

where Q̄ satisfies the marginal condition (8.31). Moreover, define for all x = (x, x) ∈X2,Q(x, ⋅) as the law of (X1, X1) where

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle X_1=d+a\,x+b\ln(Y+1),\quad Y\sim\mathcal{P}(e^{x\wedge x^\prime}), {}\\ & &\displaystyle X_1^\prime=d+a\,x^\prime+b\ln(Y+1), \end{array} \end{aligned} $$
(8.39)

and set for all x = (x, x) ∈ℝ2,

$$\displaystyle \begin{aligned} \alpha(x^\sharp)=\left\lbrace \exp-e^{x\vee x^\prime}+e^{x\wedge x^\prime}\right\rbrace . \end{aligned}$$

Then, Q̄ and Q satisfy (8.33). Using twice 1 − eu ≤ u,it follows that

$$\displaystyle \begin{aligned} \begin{array}{rcl} 1-\alpha(x^\sharp)& =&\displaystyle 1-\left\lbrace \exp-e^{x\vee x^\prime}+e^{x\wedge x^\prime}\right\rbrace\leq e^{x\vee x^\prime}-e^{x\wedge x^\prime} \\ & &\displaystyle e^{x\vee x^\prime}(1-e^{-|x-x^\prime|})\leq W(x,x^\prime)|x-x^\prime| \end{array} \end{aligned} $$

with W(x, x) = e|x|∨|x| so that (8.34) holds true. To check (8.35) and (8.36), Lemma 8.1 is applied, by checking option (i). Note first that

$$\displaystyle \begin{aligned} \mathrm{P}^{Q^\sharp}_{\delta_x\otimes\delta_{x^\prime}}\left\lbrace |X_1-X_1^\prime|=|a||x-x^\prime|\right\rbrace =1 , \end{aligned} $$
(8.40)

so that (8.37) is satisfied. To check (8.38), it can be shown that

$$\displaystyle \begin{aligned} \lim_{|x|\vee|x^\prime|\to\infty} \frac{Q^\sharp W(x,x^\prime)}{W(x,x^\prime)}=0 {} \end{aligned} $$
(8.41)

and for all M > 0,

$$\displaystyle \begin{aligned} \sup_{|x|\vee|x^\prime|\leq M} Q^\sharp W(x,x^\prime)<\infty {} \end{aligned} $$
(8.42)

Without loss of generality, assume x ≤ x. Using (8.39) provides

$$\displaystyle \begin{aligned} Q^\sharp W(x,x^\prime)=\mathrm{E}\left( e^{|X_1|\vee|X_1^\prime|}\right) \leq \mathrm{E}\left( e^{|X_1|}\right)+\mathrm{E}\left( e^{|X_1^\prime|}\right) . {} \end{aligned} $$
(8.43)

First, consider the second term of the right-hand side of (8.43),

$$\displaystyle \begin{aligned} \mathrm{E}\left( e^{|X_1^\prime|}\right)\leq e^{|d|}\mathrm{E}(e^{|ax^\prime+b\ln(1+Y)}). {} \end{aligned} $$
(8.44)

Noting that if u and v have different signs or if v = 0, then |u + v|≤|u|∨|v|. Otherwise, |u + v| = (u + v)1v>0 ∨ (−u − v)1v<0. This implies that

$$\displaystyle \begin{aligned} e^{|u+v|}\leq e^{|u|}+e^{|v|}+e^{u+v}{\mathbf{1}}_{v>0}+e^{-u-v}{\mathbf{1}}_{v<0} . \end{aligned}$$

and plugging this into (8.44),

$$\displaystyle \begin{aligned} \mathrm{E}(e^{|X_1^\prime})&\leq e^{|d|}\left( e^{|a||x^\prime|}+\mathrm{E}[(1+Y)^{|b|}]+e^{ax^\prime}\mathrm{E}[(1+Y)^b]{\mathbf{1}}_{b>0}\right.\\&\quad \left.+e^{-ax^\prime}\mathrm{E}[(1+Y)^{-b}]{\mathbf{1}}_{b<0}\right) . \end{aligned} $$

Note that for all γ ∈ [0, 1],

$$\displaystyle \begin{aligned} \mathrm{E}[(1+Y)^\gamma]\leq[\mathrm{E}(1+Y)]^\gamma=(1+e^x)^\gamma\leq1+e^{\gamma x}\leq1+e^{\gamma x^\prime} . \end{aligned}$$

Moreover, since |b|∈ [0, 1], b1b>0 ∈ [0, 1] and − b1b<0 ∈ [0, 1]. Therefore,

$$\displaystyle \begin{aligned} \begin{array}{rcl} & \mathrm{E}(e^{|X_1^\prime})&\displaystyle \leq e^{|d|}\left( e^{|a||x^\prime|}+1+e^{|b||x|}+e^{ax^\prime}(1+e^{bx^\prime}){\mathbf{1}}_{b>0}+e^{-ax^\prime}(1+e^{-bx^\prime}){\mathbf{1}}_{b<0}\right) \\ & &\displaystyle \leq e^{|d|}\left( e^{|a||x^\prime|}+1+e^{|b||x|}+e^{|a||x^\prime|}+e^{|a+b||x^\prime|}\right) \\ & &\displaystyle \leq e^{|d|}\left( 1+4e^{\gamma(|x|\vee|x^\prime|)}\right)\,, \end{array} \end{aligned} $$

where γ = |a|∨|b|∨|a + b| < 1. The first therm of the right hand side of (8.43) is treated as the second term by setting x = x. So

$$\displaystyle \begin{aligned} \mathrm{E}(e^{|X_1|})\leq e^{|d|}\left( 1+4e^{\gamma(|x|\vee|x^\prime|)}\right)\,, \end{aligned}$$

so that using (8.43),

$$\displaystyle \begin{aligned} Q^\sharp W(x,x^\prime)\leq 2 e^{|d|}\left( 1+4e^{\gamma(|x|\vee|x^\prime|)}\right)\,. \end{aligned}$$

Since γ ∈ (0, 1) and W(x, x) = e|x|∨|x|, and (8.43) clearly implies (8.41) and (8.42), this proves (A3) and together with (A1)–(A2) provides stationarity conditions for the process {Yt}  of Theorem 8.4.

Computational Aspects

The replication code for the application in Sect. 8.7 is available at https://github.com/mirkoarmillotta/covid_code. First, a function for the log-likelihood and the gradient of the log-linear Poisson autoregression is provided. The code for the other models works in a similar way and it is available upon request. Then, a function to perform the QMLE is presented. Finally, we give the code for the COVID-19 example and the relative plots. The code to perform the PIT is due to [17] and it is available in the reference therein.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Armillotta, M., Luati, A., Lupparelli, M. (2023). An Overview of ARMA-Like Models for Count and Binary Data. In: Kateri, M., Moustaki, I. (eds) Trends and Challenges in Categorical Data Analysis. Statistics for Social and Behavioral Sciences. Springer, Cham. https://doi.org/10.1007/978-3-031-31186-4_8

Download citation

Publish with us

Policies and ethics

Navigation