Introduction

Ambient air pollution has posed a significant and persistent threat to global public health over several decades1. The Global Burden of Disease (GBD) Project estimated that in 2019, air pollution, as the 4th highest-ranking driver of mortality worldwide, caused approximately 6.67 million premature deaths per year2. A large number of population-based studies suggested the associations between prolonged air pollution exposure and heightened risk of total and cause-specific mortality, but these accumulating evidences have been dominated by modeling associations rather than establishing causality3. Some scientists advocated for a shift in focus when updating and revising air quality standards, emphasizing the importance of the evidence from causal inference methods rather than relying largely on that of traditional association analysis particularly in cases where the causality of confounding factors was not clear4. Estimated effects of exposure from traditional association assessment models often suffered from residual biases introduced by unmeasured confounders5, but causal inference models, exemplified by the difference-in-differences (DID) approach, could be recognized as a viable approach to address this issue of residual confounding that frequently bedevils observational investigations6.

DID analysis, characterized by its quasi-experimental nature, has gradually gained prominence in the scientific field of environmental epidemiology for estimating causality. Standard DID methods could leverage the assumption of parallel trends and well allow controlling for unobserved confounders, thereby facilitating the identification of causal associations between exposures and outcomes of interest within a counterfactual framework7. Building upon this traditional framework, Wang and colleagues developed a variant of DID analysis to quantify the causal effects of particulate air pollutants on mortality through comparing annual fluctuations in mortality and exposure levels across designated geographical units8. The revised DID design could not only enhance the capacity to capture the temporal and spatial effects of environmental exposure, but also well account for the potential impacts engendered by socioeconomic factors and unobserved variables9.

Several emerging DID studies provided causal evidence linking particulate air pollution with mortality7,9,10,11, while causal effects of gaseous air pollutants remained extensively lacking12,13. Moreover, given that individuals were generally exposed to increasingly complex and ever-changing environmental mixtures, there have been surprisingly few studies examining the combined associations of prolonged air pollutants exposure with total and cardiorespiratory mortality14. To fill these research gaps, we designed a variant of DID analysis to assess the causal effects of four criteria gaseous air pollutants on mortality based on 108 municipalities in the United States (US) covering the years 1987–2000. The primary objective of this investigation was to examine the individual and joint associations between prolonged exposure to gaseous air pollutants and four mortality endpoints. A secondary objective was to explore the potential effect modifications by age in these causal associations. The investigation into age-related vulnerabilities concerning air pollution exposure could provide a valuable basis for the development of targeted public health interventions, a matter of increasing importance in an aging global population.

Methods

Data collection

The principal data for our analysis were sourced from National Morbidity, Mortality, and Air Pollution Study (NMMAPS) database, and the NMMAPS incorporated data from the US National Center for Health Statistics, the 2000 US Census, US Environmental Protection Agency (EPA), and the National Climatic Data Centre15,16. The NMMAPS consists of daily time-series data of mortality, air pollution, and weather conditions for 108 municipalities across the continental US over the period 1987–2000 (Fig. S1). Daily counts of deaths from total and nonaccidental causes, as well as cardiovascular disease (CVD) and respiratory disease (RD) were collected from US National Center for Health Statistics and subdivided into three age categories (≤64, 65–74, ≥75 years). Additionally, total population data was sourced from the 2000 U.S. Census. Daily average concentrations of particulate (PM2.5 and PM10) and gaseous (NO2, O3, SO2, CO) pollutants were averaged across all monitoring sites within each of the metropolitan areas, provided by the US EPA, and daily meteorological data including mean temperatures and relative humidity were derived from the National Climatic Data Centre. A detailed description of the NMMAPS dataset can be found in prior publications17,18. In our analytic stage, for each year (1987–2000) and each municipality, daily time-series data of total and cause-specific deaths were aggregated into annual sums, and daily air pollutants and temperatures were summarized as annual averages and seasonal averages or deviations in summers (June to August) and winters (December to February), respectively19.

Statistical analysis

Descriptive characteristics were summarized as means (standard deviations, SDs) for continuous variables or frequencies (proportions) for categorical variables. Spearman rank correlation coefficient (rs) was used to measure the pairwise correlations among gaseous air pollutants.

DID design

A developed variant of the DID approach, a causal modeling method that could control for unmeasured confounders and capture broader temporal trends over the period 1987 to 2000, was utilized to estimate the effect of long-term exposure to each gaseous air pollutant on total and cause-specific mortality8. This modified DID analysis was primarily designed in this study through comparing year-to-year (e.g., 1988 vs. 1987, 1989 vs. 1988) changes in annual concentrations of the four gaseous air pollutants (NO2, CO, SO2, and O3) with concurrent changes in total and cause-specific (i.e., nonaccidental, CVD, RD) death number in given 108 municipality units9. Under this DID framework, some factors such as socio-economic status, demographic characteristics, and behavioral influences may vary rarely between adjacent years so their confounding effects could be counteracted. However, other time-variant factors that were associated with exposure remained to be adjusted. In accordance to recent DID studies based on aggregated times-series data9,10,20, it is hypothesized that such potential confounders might only be seasonal temperatures in our DID analysis on air pollution-mortality associations. The parallel trend assumption is key to causal estimation in DID design, which means that differences in mortality between spatial units in the absence of exposure effects remained stable over time12.

Individual analysis

We used conditional Poisson regression (CPR) model with time-varying exposure at a 1-year scale, to estimate the causal effects of prolonged individual exposure to gaseous pollutants on mortality. CPR model could be a good solution to address the issues of overdispersion and autocorrelation of time series data20. We fitted the following DID model to analyze the mortality effects of individual gaseous air pollutants, considering spatial and temporal factors, as well as seasonal variation and population-level characteristics:

$$\text{ln}\left[E\left({Y}_{s,t}\right)\right]={\beta }_{0}+{\beta }_{1}{I}_{s}+{\beta }_{2}{I}_{t}+{\beta }_{3}{Temp}_{sum}+{\beta }_{4}{Temp}_{win}+{\beta }_{5}SD({Temp}_{sum})+{\beta }_{6}SD({Temp}_{win})+{\beta }_{7}{AP}_{s,t}+\text{ln}({Pop}_{s,t})$$

where \({Y}_{s,t}\) represents the number of total and cause-specific deaths in spatial unit s (i.e., 108 US municipalities), calendar year t (14-y period, 1987–2000). \({AP}_{s,t}\) refers to the annual mean concentrations of air pollutants in spatial unit s, calendar year t. \({\text{ln}(Pop}_{s,t})\) is an offset term representing the natural logarithm of the population in spatial unit s, which controlled for the confounding effect due to city-level population variations. \({I}_{s}\) is a dummy variable for each spatial unit s.\({I}_{t}\) is a dummy variable for each calendar year t. \({\beta }_{0}\) is the fixed intercept term. \({\beta }_{1}\), \({\beta }_{2}\) are regression coefficients of the spatial and temporal effects. \({\beta }_{3},{\beta }_{4},{\beta }_{5},{\beta }_{6}\) are regression coefficients for the effects of mean summer and winter temperatures and their standard deviations, respectively. \({\beta }_{7}\) represents the independent effect of each gaseous pollutant.

The estimated associations were expressed as relative risks (RRs) and their 95% confidence intervals (CIs) per interquartile range (IQR) increase in individual exposure to gaseous air pollutant, and statistical significance was defined as P <0.05 (two-tailed). Given the potential departure of linear assumptions in modeling the relationships between independent exposure to four gaseous air pollutants and total and cause-specific mortality, a natural cubic spline (NCS) term of each pollutant with 3 degrees of freedom was incorporated into the CPR model to derive the exposure–response (E-R) association curves10. Likelihood ratio test was employed to examine the nonlinearity in associations between gaseous pollutants and mortality via comparing goodness of fit between linear and NCS smoothed models21.

Several sensitivity analyses were carried out to testify the robustness of the estimated associations. Firstly, age-stratified analysis was conducted to investigate potentially differential susceptibility to gaseous air pollutants across distinct age groups (≤64, 65–74, and ≥75 years), and meta-regression approach was adopted to examine effect heterogeneity between groups. Secondly, we re-ran the CPR models by extending exposure lag periods, from the current year (lag 0) up to previous 3 years (lag 3) and moving averages of current and previous 1–3 year’s exposure (lag 01, lag 02, and lag 03). Thirdly, bi-, tri-, and quad-pollutant models were fitted to check the influence of co-pollutant adjustments. Finally, since reliable PM2.5 monitoring did not begin until around 1999 in the US, we performed co-pollutant analyses by additionally adjusting for PM10 in the DID model given the potential confounding effect of particulate air pollution on the mortality effect of gaseous pollutants.

gWQS regression analysis

To measure the overall joint effect of NO2, CO, SO2, and O3 along with their relative contributions to the targeted mortality outcomes, we introduced the generalized weighted quantile sum regression (gWQS) by constructing a gWQS index to facilitate mixture-exposure analysis6. Pollutant exposures were classified into quartiles separately and combined into a gWQS index by determining weights according to their relevance to the overall associations with the outcome, which could thus reduce dimensionality and avoid covariance issues22. In this study, we incorporated the gWQS approach into main CPR analysis to estimate the combined effect and relative contributions of individual gaseous pollutants to the overall mortality effect. The formula of gWQS model was specified as follows:

$$\text{ln}\left[E\left({Y}_{s,t}\right)\right]={\beta }_{0}+{\beta }_{1}{I}_{s}+{\beta }_{2}{I}_{t}+{\beta }_{3}{Temp}_{sum}+{\beta }_{4}{Temp}_{win}+{\beta }_{5}SD({Temp}_{sum})+{\beta }_{6}SD({Temp}_{win})+{\beta }_{7}{AP}_{s,t}+{\beta }_{8}\sum_{i=1}^{4}{w}_{i}{q}_{i}+\text{ln}({Pop}_{s,t}){|}_{b}$$

where \({\beta }_{8}\) represents the regression coefficient of the gWQS index; \({q}_{i}\) (0, 1, 2, 3) denotes the quantile (1st, 2nd, 3rd, or 4th) of component \(i\). \({w}_{i}\) was the estimated weight (0–1) of the \({i}^{th}\) component. b denotes the \({b}^{th}\) bootstrap step. \(\sum_{i=1}^{4}{w}_{i}{q}_{i}\) refers to the weighted index for the set of four targeted air contaminants. The final weights for air pollutants were obtained by averaging all the weights with significant coefficient of overall mixture effect23. The sum of all weights is equal to 1, with larger weights indicating greater contributions in the overall effects. We randomly split the data into training set (40%) for estimating empirical weights and validation set (60%) for deriving statistical significance. A total of 1000 bootstrap steps were performed in the training set to increase the sensitivity in detecting significant predictors and obtain stable weights. We finally set the direction of the regression coefficient effect to be positive based on the results of single-pollutant DID model24. The combined effect was presented as RR (95% CI) of total and cause-specific mortality per IQR-equivalent increase of exposure in mixture pollutants.

All statistical analyses were conducted using the R version 4.3.1 (R Foundation for Statistical Computing, Vienna, Austria), with the package “gnm” (version 1.1-2) for the CPR model analysis, “mvmeta” (version 1.0.3) for comparing effect differences between subgroups, “splines” (version 4.3.1) for smoothing NCS term, and “gWQS” (version 3.0.4) for assessing the joint associations.

Results

Table 1 describes the summary statistics of deaths, gaseous air pollutants, and climate variables in the 108 US municipalities from 1987 to 2000. We observed 10,794,553 total and 10,395,583 nonaccidental deaths, including 4,546,977 and 894,157 deaths from cardiovascular and respiratory diseases, respectively. Annual mean (SD) concentrations of gaseous air pollutants were 20.88 (6.65) ppb for NO2, 5.43 (3.59) ppb for SO2, 26.45 (5.28) ppb for O3, and 0.94 (0.37) ppm for CO, respectively (Fig. S1). NO2, CO, and SO2 exhibited low-to-moderate correlations (0.25 < rs < 0.52) with each other, but were inversely and lowly correlated with O3 (− 0.25 < rs < − 35, Fig. S2).

Table 1 Summary statistics of annual deaths, air pollution levels, and meteorological factors in 108 US municipalities over the period 1987–2000.

Figure 1 estimates the associations between mortality endpoints and independent and joint exposure to four criteria gaseous air pollutants. Single exposures to NO2, CO, and SO2 were consistently associated with increased total, nonaccidental, and CVD mortality, while no evidence for O3-mortality associations was seen in the entire population. For instance, the RR estimates of total mortality were 1.056 (95% CI: 1.039–1.073), 1.082 (1.067–1.097), 1.004 (0.992–1.016), and 1.028 (1.017–1.038) for each IQR increase in NO2, SO2, O3, and CO exposure, respectively. Only SO2 was found to be significantly associated with elevated RD mortality, with an excess risk of 11.0% (8.3%–13.8%) for each IQR rise in exposure. These findings were largely robust to sensitivity analyses via using various lag periods and fitting co-pollutant models (Table 2, Tables S1, S2). Traditional and DID analyses based on NMMAPS provided consistent associations between gaseous pollution and mortality, and the association magnitudes were found to be comparable in both estimates. Overall, approximately J-shaped curves with steeper slopes under higher exposure conditions were observed for the exposure–response relationships of mortality risks with NO2, CO, and SO2 exposure, while O3-associated risk curves tended to be inverted U-shaped (Fig. 2).

Figure 1
figure 1

Risk estimates (with 95% CIs) of total and cause-specific mortality associated with per IQR increase in O3, SO2, NO2, CO, and an IQR-equivalent increase in mixture exposure. NO2 nitrogen dioxide, CO carbon monoxide, SO2 sulfur dioxide, O3 ozone, CVD cardiovascular disease, RD respiratory disease, CI confidence interval, IQR interquartile range.

Table 2 Risk estimates (with 95% CIs) of total and cause-specific mortality associated with per IQR increase in individual and mixture exposure to gaseous air pollutants across different lag years.
Figure 2
figure 2

Estimated exposure–response curves for the associations of gaseous pollutant concentrations with total, nonaccidental, cardiovascular, and respiratory mortality. The solid black line represents the effect estimates and the shaded area represents its 95% confidence interval. NO2 nitrogen dioxide, CO carbon monoxide, SO2 sulfur dioxide, O3 ozone, CVD cardiovascular disease, RD respiratory disease, RR relative risk, CI confidence interval, ppb part per billion, ppm part per million.

In the multi-pollutant analysis via gWQS regression, one IQR-equivalent increase in mixture exposure was associated with an estimated RR of 1.067 (1.010–1.126) for total mortality, 1.067 (1.009–1.128) for nonaccidental mortality, 1.125 (1.060–1.193) for CVD mortality, and 0.986 (0.898–1.083) for RD mortality, respectively (Fig. 1). Figure 3 displays relative contributions (weights) of each gaseous air pollutant in the overall effects on mortality estimated by gWQS models. The combined mortality effects of exposure to four gaseous pollutants were mostly contributed by NO2 but least contributed by CO. Specifically, NO2 was responsible for 38.9%–49.3% of the increased mortality risks related to overall mixture exposure, while CO accounted for only 0.7%–2.4% of the joint associations with total and cause-specific mortality. It is noteworthy that the relative importance measured by gWQS weights did not match the sequence of effect sizes estimated by the single-pollutant DID models.

Figure 3
figure 3

gWQS-derived mean weights of each gaseous air pollutants in the joint effects of mixture exposure on total and cause-specific mortality. NO2 nitrogen dioxide, CO carbon monoxide, SO2 sulfur dioxide, O3 ozone, CVD cardiovascular disease, RD respiratory disease.

Figure 4 illustrates age-stratified risk estimates for mortality associated with individual exposure to gaseous pollutants. A mixed pattern was detected for age modification in pollutants-mortality associations. Our stratified analysis revealed evidently higher CO-related risks in younger group aged ≤64 years, with estimated RRs of 1.041 (1.029–1.053) for total mortality, 1.044 (1.031–1.057) for nonaccidental mortality, 1.037 (1.022–1.052) for CVD mortality, and 1.022 (0.992–1.053) for RD mortality associated with one IQR increase in CO exposure, respectively (Table S3). Despite no O3-related excess risks in the entire population, O3-mortality associations were found to be more pronounced in individuals aged 65–74 years. For total mortality, per IQR increase in O3 was associated with a risk of 1.020 (1.008–1.033) in those aged 65–74 years, 0.999 (0.985–1.013) in younger group (≤64 years) and 1.002 (0.989–1.016) in older group (≥75 years), respectively. In terms of NO2 and SO2, no evidence for age modification was identified in this analysis (P >0.05 for heterogeneity).

Figure 4
figure 4

Age-stratified associations between total and cause-specific mortality and gaseous air pollution exposure. NO2 nitrogen dioxide, CO carbon monoxide, SO2 sulfur dioxide, O3 ozone, CVD cardiovascular disease, RD respiratory disease, RR relative risk, CI confidence interval; *P <0.05, **P <0.01.

Discussion

In the context of decades of rapid industrial growth in the United States, coupled with challenges within environmental legislation, air pollution and mortality have been on a parallel upward trend over the 14-year period 1987–2000. This nationwide DID study provided compelling evidence for elevated risks for total, nonaccidental, and CVD mortality in the US population causally associated with independent and combined exposures to NO2, CO, SO2, and O3. Mixture effects associated with co-exposure to four gaseous pollutants were predominantly attributed to NO2. Our study identified approximately J-shaped curves for mortality outcomes in association with NO2, CO, and SO2, while O3-associated risk curves tended to be inverted U-shaped. Age-stratified analysis revealed mixed patterns of effect modification in exposure-mortality associations. This present study is essential in furnishing supplementary causal evidence derived from observational data, thereby substantiating the revision of air quality guidelines.

Our adapted DID analysis demonstrated robustly positive associations of long-term NO2 exposure with mortality of US population, echoing with a growing body of epidemiological studies relying on several causal modelling frameworks3. Causal evidence for excess mortality risks due to long-term NO2 exposure were reported in an Australian DID study of 2193 statistical areas level-212, instrumental variable (IV)-based analysis of 7.3 million deaths in 135 US cities25, and US national Medicare cohort analyses based on inverse probability weighting (IPW)26 and generalized propensity score (GPS)27. Similarly, in a pooled analysis from 8 European cohorts28, consistently positive NO2-mortality nexuses were seen through parallel Cox modelling strategies weighted by IPW and adjusted for GPS, wherein both approaches estimated largely comparable effect sizes. Also, our study generally supported the causal evidence for elevated mortality risk associated with long-term exposure to SO2 and CO. While SO2-associated risks were reported in another two DID analyses in China13 and Germany29, such causal evidence for CO in the general population remains extensively sparse to date. Collectively, mortality relationships with SO2 and CO were between suggestive and causal, as summarized by US Environmental Protection Agency30,31.

For O3-mortality associations, given the heterogeneous findings from population-based cohorts in the USA and Europe32, we believed that the evidence suggested "likely to be causal" for respiratory effects and “suggestive of a causal relationship” for cardiovascular effects33. Inconsistent with one Korean IV analysis34 and two US cohort studies (using GPS and IPW)26,27, our DID analysis did not identify causally positive relationship between annual O3 exposure and total and cardiopulmonary mortality. These null associations were also reported in a prior review of 20 traditional cohort analyses focusing on annual O3 metric35. Overall, current cohort evidence on O3-mortality association was largely mixed and exhibited considerable regional heterogeneity36. On the whole, national cohorts in North America37,38,39 and China40,41,42 linked elevated mortality risks (e.g., all-cause, CVD, and RD) with annual average or peak-season O3 exposure. This was in marked contrast to the large volume of epidemiological cohort evidence from South Korea43 and European countries such as Denmark44, France45, and England46, which mostly reported null or protective effects on long-term survival. Several factors such as demographic characteristics, control of potential confounders, air pollution levels, different exposure metrics and analytic methodology might be responsible for regional variations in observed mortality effects44,47.

As suggested in our mixture exposure analysis, NO2 constituted the most prominent contributor to the joint effects of four criteria gaseous air pollutants on excess risk of mortality. In a 22-year cohort study in Northern China, PM2,5 was reported as the principal contributor to the conjoined cardiopulmonary mortality effects of exposure mixture (PM10, PM2.5, SO2, and NO2)14. Generally consistent evidence was seen in two investigations48,49 from developed countries (Canada and the US), suggesting PM2.5-dominated mortality effects under co-existing exposure scenario with NO2. This further could attest to a discernible corollary that the detrimental effects of NO2 could be possibly masked to a considerable extent by particulate matter, particularly PM2.550. It has been evidenced that the presence of PM2.5 may exert an attenuating influence upon the fatality effects of NO2, given that both PM2.5 and NO2 are atmospheric pollutants emanating from common sources, such as vehicular emissions and industrial activities51. Additonally, the greatest association estimates of mixture pollutant exposure were seen at lag 02 year (Table 2), suggesting a cumulative effect of gaseous pollutants on mortality risk. In accordance with prior DID analyses9,52, the risk effects for multi-year lags were marginally larger and more stable than those for single-year lags, which well underscored the importance of considering accumulated exposure in environmental health-risk assessments. Currently available investigations pertaining to the co-exposure of air pollutants remain relatively lacking, and the complexity arising from variations in geographical locations, analytic methodologies, and combinations of pollutants may introduce great challenges in achieving comparative evidence on mixed exposures. Therefore, national and global initiatives are pressingly needed to illuminate the intricate interplay of combined exposure to multiple atmospheric pollutants and its multifaceted implications for human health.

There have been mixed findings on effect modification by age concerning the associations of gaseous pollutants with mortality. In this study, individuals aged ≤64 years showed higher CO-associated risks of total and cardiorespiratory mortality, echoing with two multi-city time-series studies in China53,54. Additionally, we observed higher O3-associated mortality risks among individuals aged 65–74 years. In two Chinese national cohort studies, more pronounced O3-mortality associations were reported in older adults aged ≥65 years40,42. By comparison, evidence from mega cohorts suggested no age difference among older people (65 years and older) in US55 and higher risk in younger (age ≤60 years) group in Rome56. Age disparities in O3-mortality associations could be possibly related to geographical and demographic variations between studies, as well as differential population exposure levels and time-activity patterns39,44. In accordance with three preceding investigations from developed countries (i.e., US and Canada)37,57 and China58, we found no evidence of effect modification by age on association between SO2/NO2 and mortality. Notably, both pollutants demonstrated a positive linkage with total and cardiopulmonary mortality across diverse age strata. As such, fostering a synergistic approach to govern and mitigate the mortality risks of these air pollutants should become an indispensable imperative, in order to achieve a state of collective health benefits in the general population.

A major feature of this study is the DID causal analysis which effectively controls for unmeasured confounders in the causal links between gaseous air pollutants and total as well as cause-specific mortality. Expanding upon the conventional DID approach, the DID “variant” well controls for additionally potential confounders (e.g., seasonal factors) that fluctuate across locations and over time, thereby mitigating potential biases in estimating exposure effects and capturing well long-term trends in gaseous pollution-morality associations. Furthermore, the application of the gWQS model in our study provides an enhanced exploration on the joint mortality effects of combined exposure to gaseous pollutants and its individual relative contributions. Nevertheless, this study has several limitations that should be acknowledged. First, although the variant DID method counteracts the effects of temporally stable individual and behavioral factors (including unmeasured ones), it remains plausible that there exist additional latent confounding factors, including socioeconomic status, health behaviors (e.g., smoking), and changes in residential characteristics, which could vary across location and time9. Additionally, DID model has the potential power for causal inference and advantages over traditional analytical approaches by reducing confounding, but established causality on the basis of data analysis still need to be validated physiologically by high-quality human and animal experiments. Second, exposure misclassifications may arise from our exclusive utilization of population-level air pollution data rather than incorporating more granular data sources such as individual-level measurements11. Third, some missing records of daily time series air pollution data in NMMAPS database, may introduce some uncertainty into the effect estimates but its impact on the association is deemed to be relatively inconspicuous59. Forth, one of the fundamental assumptions of the gWQS method is unidirectional exposure effect, encompassing either a positive or negative trend. However, this assumption may potentially lead to issues such as non-convergence of the model or biased estimation60. Fifth, the NMMAPS database comprised predominantly outdated monitoring data, and therefore the results may not be representative of current exposures and pollution sources.

Conclusions

In summary, our DID analysis provided compelling evidence for elevated risks of total, nonaccidental, and CVD mortality causally associated with both combined and independent exposure to gaseous air pollutants in US population. Age-specific analyses produced very mixed results of effect modification regarding long-term effects of gaseous pollutants on mortality, emphasizing the necessity for further investigation of age-related vulnerabilities with the aim of pinpointing specific susceptible populations at higher risks and ultimately contributed to more effective public health. Our study added national evidence of causality and highlighted that targeted policies should be formulated to strengthen the response to the health threats of gaseous pollutant exposure.