Dynamic Non-parametric Monitoring of Air-Pollution

Bersimis, Sotiris; Triantafyllopoulos, Kostas

doi:10.1007/s11009-018-9661-0

Dynamic Non-parametric Monitoring of Air-Pollution

Open access
Published: 06 September 2018

Volume 22, pages 1457–1479, (2020)
Cite this article

Download PDF

You have full access to this open access article

Methodology and Computing in Applied Probability Aims and scope Submit manuscript

Dynamic Non-parametric Monitoring of Air-Pollution

Download PDF

553 Accesses
3 Citations
Explore all metrics

Abstract

Air pollution poses a major problem in modern cities, as it has a significant effect in poor quality of life of the general population. Many recent studies link excess levels of major air pollutants with health-related incidents, in particular respiratory-related diseases. This introduces the need for city pollution on-line monitoring to enable quick identification of deviations from “normal” pollution levels, and providing useful information to public authorities for public protection. This article considers dynamic monitoring of pollution data (output of multivariate processes) using Kalman filters and multivariate statistical process control techniques. A state space model is used to define the in-control process dynamics, involving trend and seasonality. Distribution-free monitoring of the residuals of that model is proposed, based on binomial-type and generalised binomial-type statistics as well as on rank statistics. We discuss the general problem of detecting a change in pollutant levels that affects either the entire city (globally) or specific sub-areas (locally). The proposed methodology is illustrated using data, consisting of ozone, nitrogen oxides and sulfur dioxide collected over the air-quality monitoring network of Athens.

Article PDF

Probabilistic Approach to Modelling, Identification and Prediction of Environmental Pollution

Article Open access 16 September 2022

Statistical modeling of average daily concentration of pollutants in the atmosphere over Moscow megalopolis by the multiple regression method

Article 01 October 2015

Air Quality Modelling and Its Applications

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Antzoulakos DL, Bersimis S, Koutras MV (2003) On the distribution of the total number of run lengths. Ann Inst Stat Math 55:865–884
Article MathSciNet Google Scholar
Atkinson RW, Anderson HR, Sunyer J, Ayres J, Baccini M, Vonk JM, Boumghar A, Forastiere F, Forsberg B, Touloumi G, Schwartz J, Katsouyanni K (2001) Acute effects of particulate air pollution on respiratory admissions results from APHEA 2 Project. Am J Respir Crit Care Med 164:1860–1866
Article Google Scholar
Balakrishnan N, Koutras MV (2002) Runs and scans with applications. Wiley, New-York
MATH Google Scholar
Balakrishnan N, Bersimis S, Koutras MV (2009) Run and frequency quota rules in process monitoring and acceptance sampling. J Qual Technol 41:66–81
Article Google Scholar
Bersimis S, Psarakis S, Panaretos J (2007) Multivariate statistical process control charts: an overview. Qual Reliab Eng Int 23:517–543
Article Google Scholar
Chakraborti S, van der Laan P, van de Wiel MA (2004) A class of distribution-free control charts. J R Stat Soc Ser C 53:443–462
Article MathSciNet Google Scholar
Christodoulakis J, Tzanis CG, Varotsos CA, Ferm M, Tidblad J (2017) Impacts of air pollution and climate on materials in Athens, Greece. Atmos Chem Phys 17:439–448
Article Google Scholar
Frisen M (2008) Financial surveillance. Chichester
Gibbons JD, Chakraborti S (2010) Nonparametric statistical inference, 5th edn. Chapman and Hall, New-York
Book Google Scholar
Jiang X-Q, Mei X-D, Feng D (2016) Air pollution and chronic airway diseases: what should people know and do? J Thoracic Dis 8:E31–E40
Google Scholar
Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38:2895–2907
Article Google Scholar
Koutras MV, Bersimis S, Maravelakis PE (2007) Statistical process control using Shewhart control charts with supplementary runs rules. Methodol Comput Appl Probab 9:207–224
Article MathSciNet Google Scholar
Mudway IS, Kelly FJ (2000) Ozone and the lung: a sensitive issue. Mol Asp Med 21:1–48
Article Google Scholar
O’Neill MS, Ebi KL (2009) Temperature extremes and health: impacts of climate variability and change in the United States. J Occup Environ Med 51:13–25
Article Google Scholar
Pan JN, Chen ST (2008) Monitoring long-memory air quality data using ARFIMA model. Environmetrics 19:209–219
Article MathSciNet Google Scholar
Paroissin C, Penalva L, Pétrau A, Verdier G (2016) New control chart for monitoring and classification of environmental data. Environmetrics 27:182–193
Article MathSciNet Google Scholar
Petris G, Petrone S, Campagnoli P (2010) Dynamic linear models with R. Springer, New York
MATH Google Scholar
Prado R, West M (2010) Time series: modelling, computation and inference. Chapman and Hall, New York
Book Google Scholar
Qiu P (2018) Some perspectives on nonparametric statistical process control. J Qual Technol 50:49–65
Article Google Scholar
Qiu P, Li Z (2011) On nonparametric statistical process control of univariate processes. Technometrics 53:390–405
Article MathSciNet Google Scholar
Raaschou-Nielsen O, Beelen R, Wang M, Hoek M, Andersen ZJ, Hoffmann B, Stafoggia M, Samoli E, Weinmayr G, Dimakopoulou K, Nieuwenhuijsen M, Xun MM, Fischer P, Eriksen KT, Sørensen M, Tjønneland A, Ricceri F, de Hoogh K, Vineis P (2016) Particulate matter air pollution components and risk for lung cancer. Environ Int 87:66–77
Article Google Scholar
Rosenlund M, Picciotto S, Forastiere F, Stafoggia M, Perucci CA (2008) Traffic-related air pollution in relation to incidence and prognosis of coronary heart disease. Epidemiology 19:121–128
Article Google Scholar
Triantafyllopoulos K (2007) Covariance estimation for multivariate conditionally Gaussian dynamic linear models. J Forecast 26:551–569
Article MathSciNet Google Scholar
Triantafyllopoulos K (2008) Missing observation analysis for matrix-variate time series data. Statist Probab Lett 78:2647–2653
Article MathSciNet Google Scholar
Triantafyllopoulos K, Bersimis S (2016) Phase II control charts for autocorrelated processes. Qual Technol Quant Manag 13:88–108
Article Google Scholar
Triantafyllopoulos K, Harrison PJ (2008) Posterior mean and variance approximation for regression and time series problems. Statistics 42:329–350
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by the General Secretariat for Research and Technology (GSRT, Ministry of Education, Greece) research funding action ARISTEIA II.

Author information

Authors and Affiliations

Department of Statistics and Insurance Science, University of Piraeus, 80, Karaoli and Dimitriou Street, 185 34, Piraeus, Greece
Sotiris Bersimis
School of Mathematics and Statistics, Hicks Building, University of Sheffield, S3 7RH, Sheffield, UK
Kostas Triantafyllopoulos

Authors

Sotiris Bersimis
View author publications
You can also search for this author in PubMed Google Scholar
Kostas Triantafyllopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sotiris Bersimis.

Additional information

Open Access

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendices

Appendix A: Derivation of the Run Length Distribution of the Control Chart Using Rules 1 and 2 Based on T_B,1

The ARL calculation will be performed using the Markov embedding technique. According to this technique a discrete random variable, say W, defined on a sequence of multi-state trials may be described by a Markov chain {V _t,t = 0, 1, 2,…} defined on a finite state space Ω = {α₁,α₂,…,α_s}. If we let α_s be the absorbing state, the cumulative distribution of a random variable W is given by

$$\text{Pr}(W\leq n)=\text{Pr}(V_{n}=\alpha_{s})=\boldsymbol{\pi }_{0}^{\top}\boldsymbol{{\Lambda} }^{n}\mathbf{e}_{s}, $$

where $\boldsymbol {\pi }_{0}^{\top }=\{\text {Pr}(V_{0}=\alpha _{1}),\text {Pr}(V_{0}=\alpha _{2}),\ldots ,\text {Pr}(V_{0}=\alpha _{s})\}$ is the vector of initial probabilities of the Markov chain, Λ = [Pr(V _t = α_j∣V _t− 1 = α_i)]_s×s is the transition probability matrix and finally, $\mathbf {e}_{s}^{\top }=(0,0,\ldots ,0,1)_{1\times s}\in \mathbb {R}^{s}$. Using the above procedure, the probability distribution of W may be computed (Balakrishnan et al. 2009).

A suitable state space for V _t is Ω = Ω₁ ∩{α_s}, where

$$\begin{array}{@{}rcl@{}} {\Omega}_{1}&=&\{(i_{0},i_{1},i_{2},i_{3}): i_{0}= 0,1,2,3, \quad i_{1}= 2,3,4,5, \quad i_{2}= 1,2,3,4, \\ && i_{3}= 0,1,2,3 \quad \text{and} \quad i_{1}>i_{2}>i_{3}\}. \end{array} $$

The number of states in Ω are $\binom {7}{4}+ 1$, while the four coordinates may be interpreted as follows:

i₀ records the number of points fall into interval I₂ in a window of length 7 (at most) as the process evolves in time, and
i₁, i₂ and i₃ specify the positions of the last three points fall into interval I₂ in the window of length 7 (at most) as the process evolves in time.

Finally, the non-vanishing transition probabilities associated with the transition probability matrix are

Pr(V _t = (i₀ + 1,i₁ + 1,i₂ + 1,i₃ + 1)∣V _t− 1 = (i₀,i₁,i₂,i₃)) = p₂, i₀ = 0, 1, 2, i₁ = 1, 2, 3, i₂ = 0, 1, 2, i₃ = 0, 1;
Pr(V _t = α_s∣V _t− 1 = (i₀,i₁,i₂,i₃)) = p₂ + p₃, i₀ = 3, i₁ = 1, 2, 3, i₂ = 0, 1, 2, i₃ = 0, 1;
Pr(V _t = α_s∣V _t− 1 = (i₀,i₁,i₂,i₃)) = p₃, i₀ = 0, 1, 2, i₁ = 1, 2, 3, i₂ = 0, 1, 2, i₃ = 0, 1; and
Pr(V _t = (i₁ − 2,i₂ + 1,i₃ + 1, 0)∣V _t− 1 = (i₀,i₁,i₂,i₃)) = p₁, i₀ = 0, 1, 2, 3, i₁ = 2, 3, 4, 5, i₂ = 1, 2, 3, 4, i₃ = 0, 1, 2, 3,

respectively. Taking into account that α_s is the absorbing state, the transition matrix Λ may be written in the following block-form

$$ \boldsymbol{{\Lambda} }=\left[\begin{array}{cc} \boldsymbol{{\Lambda} }^{*} & \mathbf{h} \\ \mathbf{0}^{T} & 1 \end{array}\right], $$

(5)

where h = 1 −Λ^∗1 and Λ^∗ is the matrix Λ after removing the last column and the last row while 0,1, are (s − 1) × 1 column vectors of zeros and ones, respectively.

Using the partition of Λ (5), we deduce the cumulative probability function of W as follows:

$$ \text{Pr}(W= n)=\boldsymbol{\pi }_{1}^{\top}(\boldsymbol{{\Lambda} }^{*})^{n-1}\mathbf{h}, $$

(6)

where π₁ is a (s − 1) × 1 column vector that contains all entries of the initial probability vector π₀ except the last one. Using Eq. 6 we may obtain the following expression for the probability generating function of the random variable W

$$G(z)=z\boldsymbol{\pi }_{1} (\mathbf{I}-z\boldsymbol{{\Lambda} }^{*})^{-1}\mathbf{h}. $$

More details on the development of these formulae may be found in Balakrishnan et al. (2009). Replacing appropriately in the last formulae and performing the necessary calculations we take the following recursive scheme for the probability function

$$\begin{array}{@{}rcl@{}} f(n) &=& p_{1}f(n-1)+{p_{1}^{2}}p_{2}f(n-3) + {p_{1}^{3}}{p_{2}^{2}}f(n-5)+{p_{1}^{4}}{p_{2}^{2}}f(n-6) \\ &&+ 5{p_{1}^{4}}{p_{2}^{3}}f(n-7) + {p_{1}^{5}}{p_{2}^{3}}f(n-8)-3{p_{1}^{6}}{p_{2}^{4}}f(n-10) -{p_{1}^{8}}{p_{2}^{5}}f(n-13) \\ &&-10{p_{1}^{8}}{p_{2}^{6}}f(n-14) -5{p_{1}^{9}}{p_{2}^{6}}f(n-15) - p_{1}^{10}{p_{2}^{6}}f(n-16) \\ && + 3p_{1}^{10}{p_{2}^{7}}f(n-17) -p_{1}^{11}{p_{2}^{7}}f(n-18) + 6p_{1}^{11}{p_{2}^{8}}f(n-19) \\ && + 10 p_{1}^{12}{p_{2}^{9}}f(n-21) + 3p_{1}^{13}{p_{2}^{9}}f(n-22) -p_{1}^{14}p_{2}^{10}f(n-24) \\ && -4p_{1}^{15}p_{2}^{11}f(n-26) -5p_{1}^{16}p_{2}^{12} f(n-28) + p_{1}^{19}p_{2}^{14}f(n-33) \\ &&+p_{1}^{20}p_{2}^{15}f(n-35), \end{array} $$

where n > 35. This scheme is by far faster than Eq. 6, since there is no need to compute high powers of Λ^∗. For n ≤ 35, the corresponding probabilities may be calculated by exploiting appropriately G(z), or using Eq. 6.

Appendix B: Derivation of the Joint Probability of T_B,1 and T_B,2/T_B,3

We have

$$\begin{array}{@{}rcl@{}} \text{Pr}(T_{B,2}>c_{2},T_{B,1}>c_{1}) &=& \text{Pr}(T_{B,2}>c_{2} \cap \{ T_{B,1}=c_{1}+ 1 \cup T_{B,1}=c_{1}+ 1 {\cdots} \\ && \cup T_{B,1}=c_{1}+n\}) \\ &=& \sum\limits_{i=c_{1}+ 1}^{n} \text{Pr}(T_{B,2}>c_{2} , T_{B,1}=c_{1}+i) \\ &=& \sum\limits_{i=c_{1}+ 1}^{n} \text{Pr} (T_{B,2}=c_{2}+ 1 \cup \{ T_{B,2}=c_{2}+ 2 {\cdots} \\ && \cup T_{B,2}=c_{2}+n-k, T_{B,1}=c_{1}+i\}) \\ &=& \sum\limits_{i=c_{1}+ 1}^{n} \sum\limits_{j=c_{2}+ 1}^{n} \text{Pr}(T_{B,2}=j, T_{B,1}=i) \\ &=& \sum\limits_{i=c_{1}+ 1}^{n} \sum\limits_{j=c_{2}+ 1}^{n} \text{Pr}(T_{B,2}=j\mid T_{B,1}=i) \text{Pr}(T_{B,1}=i). \end{array} $$

(7)

From Theorem 4.2 from Antzoulakos et al. (2003) we have

$$\begin{array}{@{}rcl@{}} \text{Pr}(T_{B,2}=j\mid T_{B,1}=i) &=&\binom{n}{n-i}^{-1} \sum\limits_{\ell= 0}^{n-i + 1} \sum\limits_{i_{1}= 0}^{\ell} \sum\limits_{j_{1}= 0}^{\ell-i_{1}} \sum\limits_{j_{2}= 0}^{i_{1}} (-1)^{\ell+i_{1}+j_{1}-j_{2}} \\ && \times \binom{n-i + 1}{\ell} \binom{\ell}{i_{1}} \binom{\ell-i_{1}}{j_{1}} \binom{i_{1}}{j_{2}} \binom{\ell+a-1}{a} \\ && \times \binom{n-i+b}{b}, \end{array} $$

where a = j − i₁ + j₂ − w_p(j₁ + j₂) and b = i − w_pℓ − i₁ − a, while Pr(T_B,1 = i) is provided by the binomial distribution with probability of success 0.5. Substituting these into Eq. 7 provides the required formula (3).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Bersimis, S., Triantafyllopoulos, K. Dynamic Non-parametric Monitoring of Air-Pollution. Methodol Comput Appl Probab 22, 1457–1479 (2020). https://doi.org/10.1007/s11009-018-9661-0

Download citation

Received: 09 October 2017
Revised: 10 June 2018
Accepted: 12 August 2018
Published: 06 September 2018
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11009-018-9661-0

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Dynamic Non-parametric Monitoring of Air-Pollution

Abstract

Article PDF

Similar content being viewed by others

Probabilistic Approach to Modelling, Identification and Prediction of Environmental Pollution

Statistical modeling of average daily concentration of pollutants in the atmosphere over Moscow megalopolis by the multiple regression method

Air Quality Modelling and Its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Open Access

Appendices

Appendix A: Derivation of the Run Length Distribution of the Control Chart Using Rules 1 and 2 Based on T_B,1

Appendix B: Derivation of the Joint Probability of T_B,1 and T_B,2/T_B,3

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Dynamic Non-parametric Monitoring of Air-Pollution

Abstract

Article PDF

Similar content being viewed by others

Probabilistic Approach to Modelling, Identification and Prediction of Environmental Pollution

Statistical modeling of average daily concentration of pollutants in the atmosphere over Moscow megalopolis by the multiple regression method

Air Quality Modelling and Its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Open Access

Appendices

Appendix A: Derivation of the Run Length Distribution of the Control Chart Using Rules 1 and 2 Based on TB,1

Appendix B: Derivation of the Joint Probability of TB,1 and TB,2/TB,3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation

Appendix A: Derivation of the Run Length Distribution of the Control Chart Using Rules 1 and 2 Based on T_B,1

Appendix B: Derivation of the Joint Probability of T_B,1 and T_B,2/T_B,3