1 Introduction

Due to urbanization in Iran, a lot of people have migrated from rural areas to the capital of the country (i.e., Tehran). While this is beneficial for the country’s economy, sudden population expansion in Tehran has increased air pollution and threatened human health, agricultural productivity, and the ecosystem. Short-lived climate-forced ozone, the major photochemical oxidant, is one of the primary hazardous pollutants (Bell et al., 2004; Borhani et al., 2022a, 2022b, 2022c; Ghahremanloo et al., 2021; Pierrehumbert, 2014; Stohl et al., 2015). Tropospheric ozone (O3) concentrations are mostly controlled by the photochemical-oxidant precursors, such as nitrogen monoxide (NO), nitrogen dioxides (NO2), carbon monoxide (CO), and also solar radiation and temperature (Cooper et al., 2012; Frost et al., 2006). Ozone precursor emissions (i.e., NOx (NO + NO2) and CO) are generated from various sources, including power plants, industrial boilers, cement kilns, turbines, cars, trucks, and off-road vehicles (including boats, construction equipment, etc.) (Borhani & Noorpoor, 2017, 2020; Borhani et al., 2016a, 2017b, 2019, 2023; Cheraghi & Borhani, 2016a, 2016b; Hoveidi et al., 2017; Maddah et al., 2022; Mazzeo et al., 2005; Motesaddi Zarandi et al., 2015). Ozone concentrations are usually measured using ground-based and satellite systems (Chang et al., 2022; Massagué et al., 2022; Reshi et al., 2022; Wang et al., 2021). For example, Borhani et al. (2022c) investigated the changes in tropospheric ozone and its relation to ozone precursors (i.e., CO, NO2, and NO) and meteorological conditions observed at 22 ground-based stations of the Air Quality Control Company (AQCC) in Tehran from 2001 to 2020. Their results showed that the region southwest of Tehran had the highest average ozone levels. Furthermore, changes in ozone concentration in Tehran have decreased by 25% in the second decade compared to the first decade. In another study, the authors found that the average tropospheric ozone concentration during the COVID-19 crisis in 2020 was lower than that observed in 2019 (Borhani et al., 2021). One of the main reasons for the decrease in ozone concentration was the decrease in industrial and traffic activities in Tehran due to the lockdown during the COVID-19 pandemic. Gheshlaghpoor and Abedi (2022) also used Sentinel-5P satellite images to examine the connection between various land-use patterns and air pollutants (CO, NO2, SO2, and O3) in Tehran. Moreover, Bencherif et al. (2020) studied the trend of ozone in Irene Station, South Africa using both ground-based and satellite observations collected between 1998 and 2017. They compared ground-based and satellite ozone data and obtained a good correlation between them. Finally, Toihir et al. (2014) showed a good agreement between the ground-based and satellite data (R2 > 0.92) for all stations over the southern subtropic Irene from 1998 to 2012.

High levels of tropospheric ozone and its impact on air quality have emerged as a global issue in recent years. Therefore, identifying the influencing factors and predicting the concentration of this pollutant using machine learning methods could help to establish effective methods for better air quality. Many studies have so far employed both ground-based and satellite observations to predict air quality parameters. For instance, ground-level ozone data in Amman, Jordan, was forecasted using machine learning techniques by Aljanabi et al. (2020). The root mean square error (RMSE), mean absolute error (MAE), and R2 of the model were 1.016 ppb, 0.8 ppb, and 98.7%, respectively (Aljanabi et al., 2020). Kumar and Jain (2010) also forecasted ozone concentration and its precursors (NO, NO2, and CO) in Delhi, India, using an Autoregressive Integrated Moving Average (ARIMA) model. Their findings demonstrated that the suggested forecasting method could be successfully applied to provide short-term air quality warnings.

In this study, the temporal and spatial changes of short-lived climate-forced ozone and then its prediction in Tehran, Iran, were investigated. To this end, the concentrations of ground-based tropospheric ozone and ozone precursors (NO, NO2, and CO) from 21 stations were collected from January 1 to December 31, 2021. The spatial distribution of average tropospheric ozone concentration in 2021 was examined using the Inverse Distance Weighting (IDW) interpolation technique. Additionally, the effect of ozone precursors on variations in tropospheric ozone concentration was analyzed based on Spearman’s rank. Moreover, changes in concentrations of ozone and its precursors were investigated in 12 months and the corresponding heatmaps were produced. The results obtained from ground-based observations were also compared with those derived from Sentinel-5P satellite products in the Google Earth Engine (GEE) cloud computing platform. Finally, we used the Seasonal Autoregressive Integrated Moving Average (SARIMA) model for predicting the tropospheric ozone and its precursor concentrations in 2022. Figure 1 shows a flowchart describing the research process step by step.

Fig. 1
figure 1

Flowchart of research methodology

2 Materials and Methods

Figure 1 represents the schematic view of the approach used in this study for air quality monitoring and prediction. More details about each step are also described in the following subsections.

2.1 Study Area

The study area was the city of Tehran, Iran, extended from 35° 37′ N to 35° 83′ N and from 51° 09′ E to 51° 60′ E with an area of approximately 751 km2 and 1200 m above sea level (Fig. 2). Iran’s capital, Tehran, has a population of roughly 13.2 million people. Tehran has a temperate climate and approximately 333 mm of precipitation falls on average each year. The temperature values in Tehran can vary between − 15 and 43 °C. The average annual percentage of humidity is about 40%. The average wind speed is also 5.5 m/s, and the prevailing wind direction is frequently from the west. The most significant air quality issues Tehran is currently dealing with are the oppressive traffic jams, enormous dust storms, and high levels of air pollution that are causing respiratory issues for the city’s residents.

Fig. 2
figure 2

The study area (Tehran, Iran) and the locations of the air quality monitoring stations

2.2 Datasets and Preparation

2.2.1 Ground-Based Measurements

Ground-based air quality data were collected by AQCC (2021). CO, NO2, NO, and O3 concentration data were collected from 21 stations (see Fig. 2 and Table 1) in 2021. The monitoring stations of District 4, District 10, and District 16 were inactive during data recording of NO, NO2, and CO concentrations in this study. The ground-based data were validated using the World Health Organization (WHO) standards. Data that had been distorted by local environmental problems (e.g., construction activities, fires, household, and municipal waste), incorrect data (i.e., zero and negative data), and data that were significantly inconsistent with other data sources were removed from the database. Furthermore, we only selected the stations that had more than 75% of the hourly concentration data. Based on the clean air standards provided by the United States (US) Environmental Protection Agency’s (EPA) Aerometric Information Retrieval System (AIRS) database, the data obtained was translated to standard concentration values (USEPA, 1997). This standard applied a maximum concentration for ozone of 1 h and 8 h, a maximum concentration for nitrogen dioxide of 1 h, and a maximum concentration for carbon monoxide of 8 h. Ozone was detected using an analyzer O342 model at the air quality control company’s monitoring stations (Environnement S.A., Poissy Cedex, France). This instrument measures ozone concentration based on ultraviolet (UV) absorption and is a continuous ozone analyzer.

Table 1 The annual average concentrations of O3 recorded by ground-based stations and total column density of ozone recorded by Sentinel-5P satellite in Tehran from January 1 to December 31, 2021

The amounts of nitrogen oxides were measured using the chemiluminescence technique (instruments APNA-370 of Horiba, Japan; AC 32 M of Environment SA, France; and EC 9841 of Ecotech, Australia). Non-dispersive infrared (NDIR) analyzers (Teledyne API type 300/300E, San Diego, CA) were used to detect CO concentrations. Ground-based ozone and its precursors (NO, NO2, and CO) are analyzed at the air quality control stations in accordance with the 2008/50/EC and 2015/1480/EC international standards (Bugarski et al., 2020). According to international standards, periodic services and multiple calibrations of the analyzers are carried out based on a set schedule and at predetermined intervals (every 2 weeks) during the sampling process.

2.2.2 Sentinel-5P Data

The mean daily Total column density of ozone measured by Sentinel-5P, The Tropospheric Monitoring Instrument (TROPOMI) from January 1 to December 31, 2021, was also used in this study. TROPOMI has a spectral resolution of 0.25–0.55 nm (nm) and a global daily coverage with a spatial resolution of 5.5 km × 3.5 km. TROPOMI has a suitable spatial sensitivity and resolution to be used for O3 monitoring (Cofano et al., 2021; Garane et al., 2019). The Sentinel-5P data were initially converted from level 2 to level 3 with a pixel size 0.01 arc degrees using the harpconvert tool’s bin spatial operation (Gorelick et al., 2017). O3 products from the study area were then generated by applying the geographical and temporal filters. It is important to note that the products were filtered to eliminate pixels having quality assurance (QA value) values for O3 that were less than 70% (< 0.7%). Two sorts of outputs, including maps and statistical reports of the tropospheric ozone, were subsequently produced. Table 2 shows the monthly total column density of ozone concentrations by Sentinel-5 satellite in each ground monitoring station in 2021.

Table 2 Average monthly ozone pollution values observed at the ground stations and by Sentinel-5P satellite

2.3 Method

2.3.1 Heat Map** of Concentrations

A heatmap uses colors to investigate the intensity of a variable in two different spaces simultaneously (Netek et al., 2018). In this study, the heatmaps show the concentrations change over 12 months for 21 ground-based monitoring stations. The heatmaps plots were generated using Python’s seaborn heatmap functionality (Aurachman, 2021).

2.3.2 IDW Technique

In this study, the IDW spatial interpolation technique was utilized to obtain zoning maps of air pollution concentration. Using this method, a spatial analysis of the distribution of monthly ground based O3 values in 2021 was conducted. Estimate function of interpolation can be set as (Guan & Wu, 2008):

$${Z}_{j}= \frac{\sum_{i=1}^{n}{w}_{ij} {z}_{i}}{{\sum }_{i=1}^{n}{w}_{ij}}$$
(1)

where Zj is the interpolated value of ozone at pixel j, Zi is the ozone concentration at ground station I, and Wij is the weight set to be \({w}_{ij}\propto \frac{1}{{d}_{ij}}\) (dij is the distance between pixel j and ground station i).

2.3.3 Prediction Model

In this study, SARIMA models were applied to forecast the ozone concentration in 2022. SARIMA (p = autoregressive, d = differencing, q = moving average) models are based on an autoregressive (AR) model and a moving average (MA) model (Suhartono, 2011). We used auto-correlation function (ACF) and partial auto-correlation function (PACF) plots to find the values of the p, d, and q parameters of the ARIMA model. A lag corresponds to a certain point in time after which we observe the first value in the time series. Both the ACF and PACF start with a lag of 0. In this study, we implemented a simple AR process and found its order using the ACF and PACF plots (ArunKumar et al., 2021; Borhani et al., 2022d; Dettling, 2013; Salvi, 2019). In the autoregressive process, a time series is said to be autoregressive when the present amount of the time series can be obtained using previous amounts of the same time series, i.e., the present amount is the weighted mean of the past amounts.

$${y}_{t}=C+{\mathrm{\varnothing }}_{1}{y}_{t-1}+{\mathrm{\varnothing }}_{2}{y}_{t-2}+\dots +{\mathrm{\varnothing }}_{k}{y}_{t-k}+{\varepsilon }_{t}$$
(2)
$${\varepsilon }_{t}={y}_{t}-{y}_{t-1}$$
(3)
$$ACF(y_t,y_{t-k})=\frac{Covariance\;(y_t,y_{t-k})}{variance\;(y_t)}$$
(4)
$$PACF(y_t,y_{t-2})=\frac{Covariance\;(y_t,\left(y_{t-2}\vert y_{t-1}\right))}{\sqrt{variance\left(y_t\vert y_{t-1}\right)}\sqrt{variance\left(y_{t-2}\vert y_{t-1}\right)}}$$
(5)

where C is an intercept, \({\varepsilon }_{t}\) is a random error, \({\varnothing }_{i}\) (i = 1, 2 … k) indicates the auto-regressive model parameters, \({y}_{t}\) is the current time-series value of \(y\) in period t, and \({y}_{t-1}\), \({y}_{t-2}\)\({y}_{t-\mathrm{k}}\) are past values. The partial ACF between two observations of \({y}_{t}\) and \({y}_{t-2}\) (assuming k = 2) can be written as Eq. (5).

2.4 Assessment method

In this study, Spearman’s rank correlation coefficient (\(\rho\)), RMSE, and determination coefficient (R2) (see Eqs. (67)) were applied to assess the accuracy of the proposed models.

$$\rho =1-\frac{6\sum {y}_{i}^{2}}{n ({n}^{2}-1)}$$
(6)
$${R}^{2}=1-\frac{\sum_{i=1}^{n}({y}_{i}-\overline{{y }_{i}})}{\sum_{i=1}^{n}({y}_{i}-\overline{y })}$$
(7)
$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{{(y}_{i}-{\widehat{y}}_{i })}^{2}}$$
(8)

where \({\widehat{y}}_{i}\) and \({y}_{{\varvec{i}}}\) are the observed and forecasted \(i\) values of \(\overline{{y }_{i}}\), respectively, and \(\overline{y }\) represents the mean \(y\) values of the observed and forecasted in the tested sample set. Moreover,\(n\) indicates the number of observations.

A single asterisk was used to indicate \(\rho\) values that were significant at the 0.05 level (P-value ≤ 0.05). \({R}^{2}\) was used to evaluate the strength of a linear relationship between pairs of variables. The RMSE deviation was used to evaluate the accuracy of the proposed regression equations estimation. All the statistical analyses were performed using the SPSS software (Agrawal et al., 2017; Beckerman et al., 2013; Goap et al., 2018; Hussainy et al., 2018; Özbay, 2012).

3 Results and Discussion

3.1 Data Analysis

In 2021, the average annual tropospheric ozone concentration at 21 air quality ground monitoring stations was 21.16 ppb. Figure 3 shows several examples of the produced heatmaps from ground-based ozone and ozone precursors (NO, NO2, and CO) observations in 2021. Contaminants are indicated by the red color, while the blue color shows less concentration values. The ground-based ozone concentration was increased from January to July and was decreased from July to December. The highest ozone concentrations were in summer, and the lowest concentrations were in winter (Fig. 3d).

Fig. 3
figure 3

Heatmap of pollutants concentration based on the data recorded at 21 ground-based monitoring stations in 2021: (a) NO, (b) NO2, (c) CO, and (d) O3

Climate change has been increasing ozone concentrations in many regions of the world, including the city of Tehran, by fostering atmospheric conditions that are favorable for ozone generation (Weaver et al., 2009; Zhang et al., 2019). Hence, the air was severely polluted in Tehran in summer due to high ozone levels (Mazaheri Tehrani et al., 2015; Mosadegh et al., 4) also showed that from January 1 to December 31, 2021, the areas around the District 11 and Setad Bohran stations had the highest tropospheric ozone levels, while the lowest tropospheric ozone levels were observed in regions near the District 10 and District 4 stations.

Fig. 4
figure 4

Distribution of the average monthly tropospheric ozone concentration at air quality monitoring stations in Tehran from January to December 2021

To evaluate the spatial patterns of the total column density of ozone values, Fig. 5 shows the monthly average O3 column values obtained from the Sentinel-5 TROPOMI datasets acquired from January 1 to December 31, 2021 (see Fig. 5 and Table 2). The highest O3 concentration values were observed in February, March, and April (Fig. 5b–d).

Fig. 5
figure 5

The spatial distribution of the average monthly values of the total column density of ozone in Tehran from January to December 2021

Figure 6 illustrates the changes in satellite-based total column density of ozone and ground-based ozone values in Tehran from January 1 to December 31, 2021. The results showed that the spatial distribution of ozone and IDW modeling produced different results. An earlier investigation by Brogniez et al. (2005) discovered a fair amount of consistency between ground-based measurements collected from six European sites and satellite ozone data. Generally, Total Ozone Map** Spectrometer (TOMS) ozone values appeared to be slightly higher (less than 3 percent) than those observed at the ground-based stations. Surface ozone is part of the total atmosphere ozone. The total column density of ozone is the sum of tropospheric and stratospheric ozone in a vertical column with a cross-sectional area of one square centimeter from the atmosphere boundary to the ground (Danielsen, 1968; Fishman et al., 2003; Junge, 1962). It should be noted that it is impossible to adequately model concentrations throughout the entire city of Tehran with only 21 monitoring stations.

Fig. 6
figure 6

Comparison of monthly average of tropospheric ozone and total column density of ozone in Tehran from January 1 to December 31, 2021

Investigation of the trend of annual changes in total ozone values indicated that the greatest change occurred in winter and spring. The relative decrease of average surface ozone concentration in winter despite high levels of total ozone could be due to increased trend in rainfall and decreased trend in temperature and, subsequently, attenuation of overnight temperature inversion and reduction of high concentrations of pollutants (Fig. 6). This result conformed with similar investigations discussed in (Bray et al., 2021; Chambers, 2021; Doak et al., 2021; Fan et al., 2020; Filonchyk et al., 2021; Vîrghileanu et al., 2020). Regarding the spatial distribution of Sentinel-5p ozone values in 2021, the highest concentrations values were observed in the Shad Abad, Mahallati, District 19, and Ray stations (see Fig. 5 and Table 2). Overall, the highest levels of ozone, which were observed from both stations and satellite observations, were over the district stations in the west of Tehran.

The annual mean concentrations of CO, NO, and NO2 ranged from 1.05781 to 2.32084 ppm, 33.58269 to 130.52431 ppb, and 38.61158 to 71.35404 ppb, respectively. The highest concentration of ozone precursors was detected at the Sadr station (see Fig. 3 and Table 3). The heatmaps showed maximum ozone precursor concentrations to have occurred in the winter’s coldest months (i.e., January, November, and December) (Fig. 3 and Table 3). Thus, O3 and its precursors had different patterns.

Table 3 Average monthly values of tropospheric ozone precursors observed at the ground stations from January 1 to December 31, 2021

3.2 Correlation Analysis

The \(\rho\) values between the precursors and the ground-based tropospheric ozone are provided in Table 4. There is a negative relationship between the ground-based O3 and CO, NO2, and NO. The strongest negative value was found for NO. This result is consistent with previous studies (Afonso & Pires, 2017; Mao & Talbot, 2004; Ridley et al., 1992; Wang et al., 2021; Yu et al., 2021). Additionally, there is a strong positive correlation between ozone precursors, indicating that CO, NO, and NO2 might come from the same sources or one might result from the chemical transformation of another (Gong et al., 2015; Olaguer et al., 2009).

Table 4 Spearman’s rank correlations coefficient for the ground-based ozone and its precursors in Tehran from January 1 to December 31, 2021

3.3 Results of the prediction model

Figure 7 demonstrates the PACF and ACF plots for the prediction of Tropospheric Ozone and its precursors (i.e., Ground-based Ozone, Sentinel-5p Ozone, NO, NO2, and CO) with 95% confidence intervals from 2021 to 2022. The measured data trend for each pollutant from January 1 to December 31, 2021, and their prediction trend from January 1 to December 31, 2022, are also illustrated in Fig. 8. The SARIMA (2, 0, 0), SARIMA (2, 0, 0), SARIMA (1, 0, 0), SARIMA (1, 0, 0), and SARIMA (1, 0, 0) were the most accurate models for forecasting ground-based ozone concentration, Sentinel-5p ozone, NO, NO2, and CO. The R2 and RMSE values were also equal to (0.89, 2.9638), (0.80, 0.0005), (0.91, 20.3980), (0.78, 5.6325), and (0.81, 0.2244), respectively. According to the forecast models, the concentrations of ground-based ozone, Sentinel-5p ozone, and NO decreased by 3.36%, 1.07%, and 0.54%, respectively, while those of NO2 and CO increased by 0.63% and 0.57%, respectively in 2022 compared to 2021 (see Fig. 8 and Table 5). Moreover, according to the actual values from January 2022 to August 2022, the concentrations of ground-based ozone increased by 4.98% while those of Sentinel-5p ozone, NO, NO2, and CO decreased by 0.46%, 9.31%, 1.95%, and 4.71%, respectively in 2022 compared to the forecast models. For example, in April 2022 prediction for ground-based O3 concentration, the actual value was 22.1000 ppb, while the predicted value was 22.6831 ppb, a difference of 0.5831 ppb. Thus, the relative errors are lower than 10% (see Table 6). Overall, the results showed that the SARIMA models were effective in predicting air quality parameters.

Fig. 7
figure 7

The auto-correlation function (ACF) and partial auto-correlation function (PACF) plots to predict the concentration of (a) NO, (b) NO2, (c) CO, (d) Ground-based Ozone, and (e) Sentinel-5p Ozone

Fig. 8
figure 8

Time-series analysis of the observed and predicted tropospheric ozone and its precursors concentration in Tehran: (a) Ground-based Ozone, (b) Sentinel-5p Ozone, (c) NO, (d) NO2 and (e) CO

Table 5 Comparison of the predicted and actual values for tropospheric ozone values and its precursors in Tehran
Table 6 Comparison of the predicted and actual values for tropospheric ozone values and its precursors in Tehran from January 2022 to August 2022 (evaluation forecast algorithm)

4 Conclusions

This research presents the findings of a comparative analysis of detecting and monitoring the tropospheric ozone concentration based on variations in the precursor (i.e., NO, NO2, and CO) concentrations in Tehran from January 1 to December 31, 2021. The results of heatmaps showed that concentrations of ground-based ozone and its precursors changed over 12 months. The highest and lowest ozone concentrations were in summer and winter, respectively. The highest ozone concentration was also observed at District 11 (Station 13). Strong negative correlations were also observed between ground-based O3 with NO, NO2, and CO. The modeled ozone concentration was also compared to the average monthly change of total column density of ozone derived from Sentinel-5 satellite data in GEE. The results showed that with the integration of ground-based station data with the Sentinel-5P satellite, more reasonable results for spatiotemporal air quality monitoring can be produced. This was because satellite data could be used to fill in any data gaps in the ground-based data. Finally, the SARIMA modeling approach was used for forecasting tropospheric ozone and its precursors in 2022. Upon testing the model, it was identified that the SARIMA model was able to predict tropospheric ozone and its precursors in the 12 months from 1 January to 31 December 2022 relatively accurately, with only 1 year (2021) of training data.

It should be noted that this study only considered the ozone and ozone precursors data, while other factors (e.g., meteorological conditions) could have a role in different air pollution levels. Therefore, the results of this study will be further improved by considering meteorological conditions in future studies.