Abstract
Air pollution poses a major problem in modern cities, as it has a significant effect in poor quality of life of the general population. Many recent studies link excess levels of major air pollutants with health-related incidents, in particular respiratory-related diseases. This introduces the need for city pollution on-line monitoring to enable quick identification of deviations from “normal” pollution levels, and providing useful information to public authorities for public protection. This article considers dynamic monitoring of pollution data (output of multivariate processes) using Kalman filters and multivariate statistical process control techniques. A state space model is used to define the in-control process dynamics, involving trend and seasonality. Distribution-free monitoring of the residuals of that model is proposed, based on binomial-type and generalised binomial-type statistics as well as on rank statistics. We discuss the general problem of detecting a change in pollutant levels that affects either the entire city (globally) or specific sub-areas (locally). The proposed methodology is illustrated using data, consisting of ozone, nitrogen oxides and sulfur dioxide collected over the air-quality monitoring network of Athens.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Antzoulakos DL, Bersimis S, Koutras MV (2003) On the distribution of the total number of run lengths. Ann Inst Stat Math 55:865–884
Atkinson RW, Anderson HR, Sunyer J, Ayres J, Baccini M, Vonk JM, Boumghar A, Forastiere F, Forsberg B, Touloumi G, Schwartz J, Katsouyanni K (2001) Acute effects of particulate air pollution on respiratory admissions results from APHEA 2 Project. Am J Respir Crit Care Med 164:1860–1866
Balakrishnan N, Koutras MV (2002) Runs and scans with applications. Wiley, New-York
Balakrishnan N, Bersimis S, Koutras MV (2009) Run and frequency quota rules in process monitoring and acceptance sampling. J Qual Technol 41:66–81
Bersimis S, Psarakis S, Panaretos J (2007) Multivariate statistical process control charts: an overview. Qual Reliab Eng Int 23:517–543
Chakraborti S, van der Laan P, van de Wiel MA (2004) A class of distribution-free control charts. J R Stat Soc Ser C 53:443–462
Christodoulakis J, Tzanis CG, Varotsos CA, Ferm M, Tidblad J (2017) Impacts of air pollution and climate on materials in Athens, Greece. Atmos Chem Phys 17:439–448
Frisen M (2008) Financial surveillance. Chichester
Gibbons JD, Chakraborti S (2010) Nonparametric statistical inference, 5th edn. Chapman and Hall, New-York
Jiang X-Q, Mei X-D, Feng D (2016) Air pollution and chronic airway diseases: what should people know and do? J Thoracic Dis 8:E31–E40
Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38:2895–2907
Koutras MV, Bersimis S, Maravelakis PE (2007) Statistical process control using Shewhart control charts with supplementary runs rules. Methodol Comput Appl Probab 9:207–224
Mudway IS, Kelly FJ (2000) Ozone and the lung: a sensitive issue. Mol Asp Med 21:1–48
O’Neill MS, Ebi KL (2009) Temperature extremes and health: impacts of climate variability and change in the United States. J Occup Environ Med 51:13–25
Pan JN, Chen ST (2008) Monitoring long-memory air quality data using ARFIMA model. Environmetrics 19:209–219
Paroissin C, Penalva L, Pétrau A, Verdier G (2016) New control chart for monitoring and classification of environmental data. Environmetrics 27:182–193
Petris G, Petrone S, Campagnoli P (2010) Dynamic linear models with R. Springer, New York
Prado R, West M (2010) Time series: modelling, computation and inference. Chapman and Hall, New York
Qiu P (2018) Some perspectives on nonparametric statistical process control. J Qual Technol 50:49–65
Qiu P, Li Z (2011) On nonparametric statistical process control of univariate processes. Technometrics 53:390–405
Raaschou-Nielsen O, Beelen R, Wang M, Hoek M, Andersen ZJ, Hoffmann B, Stafoggia M, Samoli E, Weinmayr G, Dimakopoulou K, Nieuwenhuijsen M, Xun MM, Fischer P, Eriksen KT, Sørensen M, Tjønneland A, Ricceri F, de Hoogh K, Vineis P (2016) Particulate matter air pollution components and risk for lung cancer. Environ Int 87:66–77
Rosenlund M, Picciotto S, Forastiere F, Stafoggia M, Perucci CA (2008) Traffic-related air pollution in relation to incidence and prognosis of coronary heart disease. Epidemiology 19:121–128
Triantafyllopoulos K (2007) Covariance estimation for multivariate conditionally Gaussian dynamic linear models. J Forecast 26:551–569
Triantafyllopoulos K (2008) Missing observation analysis for matrix-variate time series data. Statist Probab Lett 78:2647–2653
Triantafyllopoulos K, Bersimis S (2016) Phase II control charts for autocorrelated processes. Qual Technol Quant Manag 13:88–108
Triantafyllopoulos K, Harrison PJ (2008) Posterior mean and variance approximation for regression and time series problems. Statistics 42:329–350
Acknowledgements
This work is supported by the General Secretariat for Research and Technology (GSRT, Ministry of Education, Greece) research funding action ARISTEIA II.
Author information
Authors and Affiliations
Corresponding author
Additional information
Open Access
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Appendices
Appendix A: Derivation of the Run Length Distribution of the Control Chart Using Rules 1 and 2 Based on TB,1
The ARL calculation will be performed using the Markov embedding technique. According to this technique a discrete random variable, say W, defined on a sequence of multi-state trials may be described by a Markov chain {V t,t = 0, 1, 2,…} defined on a finite state space Ω = {α1,α2,…,αs}. If we let αs be the absorbing state, the cumulative distribution of a random variable W is given by
where \(\boldsymbol {\pi }_{0}^{\top }=\{\text {Pr}(V_{0}=\alpha _{1}),\text {Pr}(V_{0}=\alpha _{2}),\ldots ,\text {Pr}(V_{0}=\alpha _{s})\}\) is the vector of initial probabilities of the Markov chain, Λ = [Pr(V t = αj∣V t− 1 = αi)]s×s is the transition probability matrix and finally, \(\mathbf {e}_{s}^{\top }=(0,0,\ldots ,0,1)_{1\times s}\in \mathbb {R}^{s}\). Using the above procedure, the probability distribution of W may be computed (Balakrishnan et al. 2009).
A suitable state space for V t is Ω = Ω1 ∩{αs}, where
The number of states in Ω are \(\binom {7}{4}+ 1\), while the four coordinates may be interpreted as follows:
-
i0 records the number of points fall into interval I2 in a window of length 7 (at most) as the process evolves in time, and
-
i1, i2 and i3 specify the positions of the last three points fall into interval I2 in the window of length 7 (at most) as the process evolves in time.
Finally, the non-vanishing transition probabilities associated with the transition probability matrix are
-
Pr(V t = (i0 + 1,i1 + 1,i2 + 1,i3 + 1)∣V t− 1 = (i0,i1,i2,i3)) = p2, i0 = 0, 1, 2, i1 = 1, 2, 3, i2 = 0, 1, 2, i3 = 0, 1;
-
Pr(V t = αs∣V t− 1 = (i0,i1,i2,i3)) = p2 + p3, i0 = 3, i1 = 1, 2, 3, i2 = 0, 1, 2, i3 = 0, 1;
-
Pr(V t = αs∣V t− 1 = (i0,i1,i2,i3)) = p3, i0 = 0, 1, 2, i1 = 1, 2, 3, i2 = 0, 1, 2, i3 = 0, 1; and
-
Pr(V t = (i1 − 2,i2 + 1,i3 + 1, 0)∣V t− 1 = (i0,i1,i2,i3)) = p1, i0 = 0, 1, 2, 3, i1 = 2, 3, 4, 5, i2 = 1, 2, 3, 4, i3 = 0, 1, 2, 3,
respectively. Taking into account that αs is the absorbing state, the transition matrix Λ may be written in the following block-form
where h = 1 −Λ∗1 and Λ∗ is the matrix Λ after removing the last column and the last row while 0,1, are (s − 1) × 1 column vectors of zeros and ones, respectively.
Using the partition of Λ (5), we deduce the cumulative probability function of W as follows:
where π1 is a (s − 1) × 1 column vector that contains all entries of the initial probability vector π0 except the last one. Using Eq. 6 we may obtain the following expression for the probability generating function of the random variable W
More details on the development of these formulae may be found in Balakrishnan et al. (2009). Replacing appropriately in the last formulae and performing the necessary calculations we take the following recursive scheme for the probability function
where n > 35. This scheme is by far faster than Eq. 6, since there is no need to compute high powers of Λ∗. For n ≤ 35, the corresponding probabilities may be calculated by exploiting appropriately G(z), or using Eq. 6.
Appendix B: Derivation of the Joint Probability of TB,1 and TB,2/TB,3
We have
From Theorem 4.2 from Antzoulakos et al. (2003) we have
where a = j − i1 + j2 − wp(j1 + j2) and b = i − wpℓ − i1 − a, while Pr(TB,1 = i) is provided by the binomial distribution with probability of success 0.5. Substituting these into Eq. 7 provides the required formula (3).
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Bersimis, S., Triantafyllopoulos, K. Dynamic Non-parametric Monitoring of Air-Pollution. Methodol Comput Appl Probab 22, 1457–1479 (2020). https://doi.org/10.1007/s11009-018-9661-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-018-9661-0
Keywords
- Multivariate statistical process control
- Time series monitoring
- Air surveillance
- Air pollution
- Non-parametric control chart
- Generalised binomial-type statistics
- Markov chain embedded variables