Method comparison and estimation of causal effects of insomnia on health outcomes in a survey sampled population

Shahu, Anja; Chung, Joon; Tarraf, Wassim; Ramos, Alberto R.; González, Hector M.; Redline, Susan; Cai, Jianwen; Sofer, Tamar

doi:10.1038/s41598-023-36927-2

Method comparison and estimation of causal effects of insomnia on health outcomes in a survey sampled population

Article
Open access
Published: 17 June 2023

Volume 13, article number 9831, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Method comparison and estimation of causal effects of insomnia on health outcomes in a survey sampled population

Download PDF

Anja Shahu^1,2,
Joon Chung²,
Wassim Tarraf³,
Alberto R. Ramos⁴,
Hector M. González⁵,
Susan Redline²,
Jianwen Cai⁶ &
…
Tamar Sofer^1,2,7

Abstract

Applying causal inference methods, such as weighting and matching methods, to a survey sampled population requires properly incorporating the survey weights and design to obtain effect estimates that are representative of the target population and correct standard errors (SEs). With a simulation study, we compared various approaches for incorporating the survey weights and design into weighting and matching-based causal inference methods. When the models were correctly specified, most approaches performed well. However, when a variable was treated as an unmeasured confounder and the survey weights were constructed to depend on this variable, only the matching methods that used the survey weights in causal estimation and as a covariate in matching continued to perform well. If unmeasured confounders are potentially associated with the survey sample design, we recommend that investigators include the survey weights as a covariate in matching, in addition to incorporating them in causal effect estimation. Finally, we applied the various approaches to the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) and found that insomnia has a causal association with both mild cognitive impairment (MCI) and incident hypertension 6–7 years later in the US Hispanic/Latino population.

A Tutorial on Applying the Difference-in-Differences Method to Health Data

Article Open access 07 September 2023

Sample size and power calculations for causal mediation analysis: A Tutorial and Shiny App

Article 25 May 2023

Fixed and random effects models: making an informed choice

Article Open access 07 August 2018

Introduction

Modifiable lifestyle behaviors, such as sleep, are essential to health, and are therefore targets for intervention to mitigate or prevent adverse health outcomes. While randomized controlled trials (RCTs) are the gold standard for causal inference, they can also be impractical and expensive and lack generalizability when using specific inclusion and exclusion criteria^1,2. RCTs may also be unethical if they withhold treatment for some individuals when one is available³. Thus, researchers have called for greater use of causal inference methods in observational sleep studies to assess the potential impact of treatment effects⁴.

Using multiple causal inference methods can establish more robust causal associations than application of a single approach^5,6. With the growing availability of complex health surveys conducted on racial and ethnic minorities, who have been historically underrepresented in research despite having higher disease burdens, investigators have more opportunities to make inferences on these populations and ensure that research is more representative of the world’s diversity^7,8. However, complex health surveys––which use multi-stage probability sampling and include survey weights that contain information on the sampling design and adjustments for issues, such as non-response––present unique challenges. Survey weights and design must be incorporated into statistical models to obtain estimates representative of the target population and to provide correct standard errors (SEs)⁹. However, since causal inference methods were developed under the assumption of a simple random sample (SRS), incorporating the survey weights and design in a way that limits confounding while maintaining representativeness is not straightforward.

Motivated by the Hispanic Community Health Study/Study of Latinos (HCHS/SOL)––the largest longitudinal cohort study with multiple sleep measures at baseline and the only study with comprehensive sleep measures in a large, diverse sample of US Hispanics/Latinos, we aimed to investigate how to apply matching and weighting-based causal inference methods to complex health survey data. Both weighting and matching methods estimate the causal effect by balancing the distribution of covariates between the exposed and unexposed groups, relying on the three assumptions of exchangeability, positivity and Stable Unit Treatment Value Assumption (SUTVA)¹⁰. We conducted a simulation study to compare various approaches for incorporating the survey weights and design into weighting and matching methods^{11,12,13,14,15}. We use the simulation results to inform our use of the HCHS/SOL for estimating the effect of insomnia on prevalent mild cognitive impairment (MCI) and incident hypertension in the US Hispanic/Latino population.

Potential outcomes framework and causal estimands

Relying on a potential outcomes framework, suppose that a study has $n$ individuals sampled from a population of size $N$. An individual $i$ has two potential outcomes ${Y}_{i}\left(a\right)$, for exposure $a=0$ (unexposed) and $a=1$ (exposed)¹⁶. Let ${Z}_{i}$ be the indicator for observed exposure, with ${Z}_{i}=0$ if unexposed and ${Z}_{i}=1$ if exposed¹⁶. The individual’s observed outcome is then ${Y}_{i}\left({Z}_{i}\right)={Z}_{i}\times {Y}_{i}\left(1\right)+\left(1-{Z}_{i}\right)\times {Y}_{i}\left(0\right)$¹⁶.

At the population level, the average potential outcomes are represented by $E\left[Y\left(1\right)\right]$ and $E\left[Y\left(0\right)\right]$ when all individuals in the population are exposed and unexposed, respectively¹⁷. For binary outcomes, these values are represented by probabilities: $Pr\left[Y\left(1\right)=1\right]$ and $Pr\left[Y\left(0\right)=1\right]$, respectively¹⁷. Some causal effects of interest can include the rate difference $Pr\left[Y\left(1\right)=1\right]-Pr\left[Y\left(0\right)=1\right]$, the risk ratio $\frac{Pr\left[Y\left(1\right)=1\right]}{Pr\left[Y\left(0\right)=1\right]}$ and the odds ratio $\frac{\left(Pr\left[Y\left(1\right)=1\right]/Pr\left[Y\left(1\right)=0\right]\right)}{\left(Pr\left[Y\left(0\right)=1\right]/Pr\left[Y\left(0\right)=0\right]\right)}$¹⁷.

Common causal estimands (i.e., defined quantities that one can estimate from data) of interest include the average treatment effect (ATE), average treatment effect for the treated (ATT), conditional ATE (CATE) and conditional ATT (CATT)¹⁶. The marginal estimands, ATE and ATT, define exposure effect on the entire population and on those individuals who are observed as exposed, respectively¹⁶, obtained from analysis that is not adjusted for any covariates. The conditional estimands, CATE and CATT, align with the ATE and ATT definitions, but are additionally conditional on the sampling distribution of the covariates, ${X}_{i}$¹⁶, i.e. are obtained from analysis that adjusts for covariates. For a continuous outcome, we define ATE as $E\left[Y\left(1\right)-Y\left(0\right)\right]$, ATT as $E\left[Y\left(1\right)-Y\left(0\right)|Z=1\right]$, CATE as $E\left[Y\left(1\right)-Y\left(0\right)|X\right]$ and CATT as $E\left[Y\left(1\right)-Y\left(0\right)|Z=1, X\right]$¹⁶. Like the population causal effect, these definitions can be modified to apply to a binary outcome. In observational data that use exposure, rather than treatment, data, we use the term “exposed”, while in clinical trials and observational studied in which individuals are treated with a specific intervention, the term “treatment” is used. Henceforth we use “ATT” and “CATT” rather than “average exposure effect on the exposed” and “conditional average exposed effect on the exposed” for consistency with the causal inference literature.

The ATE and the ATT may coincide in a randomized controlled trial (RCT) due to randomization, but will not generally coincide in an observational study because the exposed and unexposed groups will not be comparable, i.e. they do not have the same characteristics and covariate distributions¹⁸. In an RCT, in the case of a continuous outcome, the ATE and CATE and the ATT and CATT will both coincide, i.e., the difference in continuous outcome means across treatment groups is “collapsible”. However, when the outcome is binary, these estimands may not coincide due to non-collapsibility¹⁰. Table 1 provides an overview of the causal inference methods that we compare and are described below, including information on the target estimand of each approach (ATE or ATT; and CATE or CATT if covariate adjusted).

Table 1 Comparison of weighting and matching-based causal inference methods.

Full size table

Implementation of causal inference methods in a survey study

We study the application of two categories of causal inference approaches: matching and weighting methods. Briefly, matching methods typically identify sets (or minimally, pairs) of exposed and unexposed individuals who have similar characteristics and use these individuals in the regression analysis. Weighting methods perform weighted regression analysis, where each observation is weighted according to its probability of being exposed. Notably, this is an analogue of survey regression which weights each observation according to its sampling probability into the study (survey weight). A challenge of applying both matching and weighting-based causal inference methods to a survey-sampled population is in using the survey weights, which we call “original survey weights” (OSW), to obtain causal effect estimates that are representative of the target population.

Both matching and weighting methods may rely on both the OSW and on propensity score-based weights¹⁰. The propensity score for individual $i$ is defined as the probability of exposure, conditional on measured covariates: ${e}_{i}=P\left({Z}_{i}=1|{X}_{i1},\dots ,{X}_{ip}\right)$¹⁰. A popular method to calculate propensity scores is to use a logistic model given by ${\text{logit}}\left({e}_{i}\right)={\beta }_{0}+{\beta }_{1}{X}_{i1}+\dots +{\beta }_{p}{X}_{ip}$ where $p$ is the number of measured covariates¹⁰. For both the weighting and matching methods, we consider estimating the propensity scores in two ways: (1) OSW-weighted logistic regression, and (2) logistic regression with OSW as a covariate. In the weighting and matching methods sections below, we describe propensity score-based weights and additional method-specific weights.

Matching methods

Matching methods are generally implemented in three steps: (1) matching exposed and unexposed; (2) assessing covariate balance between the exposure groups and (3) estimating causal effect¹⁰. We studied both propensity score and coarsened exact matching (PSM and CEM) implemented using the “MatchIt” package in R. Generally, PSM matches individuals by ensuring that their propensity scores are similar; CEM first “coarsens” variables used for matching, with coarsening being the process of creating bins of values of continuous variables, followed by matching, i.e. ensuring that the coarsened variables are the same in matched individuals. We considered a few approaches, outlined in Fig. 1, to incorporating the survey weights and design in steps 1 and 3.

Matching exposed and unexposed

In PSM, we calculated the distance between individuals, defined as ${D}_{ij}={\widehat{e}}_{i}-{\widehat{e}}_{j}$¹⁰. We then used greedy 1:1 nearest neighbor matching without replacement. This algorithm matches every unexposed individual $i$ to the exposed individual with the smallest distance from individual $i$ and discards any unmatched unexposed individuals¹⁰. In CEM, we sorted individuals into bins based on coarsened variables¹⁹. We considered matching based on coarsened covariates only and based also on coarsened OSW. We coarsened the continuous covariates manually, choosing meaningful cut points when available or otherwise choosing quantiles as our cut points. We then pruned individuals from any bin that did not contain at least one exposed and one unexposed individual¹⁹. Specifically, the CEMW ${w}_{i}$ for individual $i$ is given by: ${w}_{i}={Z}_{i}+\left(1-{Z}_{i}\right)\left[\frac{{n}_{\mathrm{unexposed}}}{{n}_{\mathrm{exposed}}}\times \frac{{n}_{{b}_{i},\mathrm{exposed}}}{{n}_{{b}_{i},\mathrm{unexposed}}}\right]$, where ${b}_{i}$ is the bin that individual $i$ has been sorted into and ${n}_{\mathrm{unexposed}}$ and ${n}_{\mathrm{exposed}}$ are the numbers of unexposed and exposed individuals in the matched sample, respectively²⁰. Thus, for matched individuals, the algorithm yielded CEMW that “equalize” the two groups of matched individuals by up- and down-weighting the number of exposed and unexposed individuals within each bin, and weight individuals in both groups so that both groups have similar characteristics to the exposed group^19,20.

Estimating causal effects

For both PSM and CEM, we used the matched samples to fit Poisson regressions with a “log” link to estimate incident rate ratios (for incident outcomes) and logistic regressions to estimate odds ratios (for prevalent outcomes). We used both unadjusted and multivariable-adjusted regressions to estimate the marginal and conditional causal effects, respectively, incorporating the sampling design using the “survey” package in R for any weighted analysis. For PSM, we fit: (1) unweighted regression; (2) weighted with OSW and (3) weighted with inherited survey weights (ISW), in which unexposed individuals “inherit” the survey weight of the exposed individual that they are matched with. For CEM, we fit weighted regressions with: (1) CEMW and (2) CEMW $\times$ OSW.

Weighting methods

We studied two types of propensity score-based weighting methods: (1) inverse probability of treatment weighting (IPTW), weighting both the exposed and unexposed individuals using their estimated exposure probabilities with ${w}_{i}=\frac{{Z}_{i}}{{\widehat{e}}_{i}}+\frac{1-{Z}_{i}}{1-{\widehat{e}}_{i}}$, and (2) weighting by the odds using ${w}_{i}={Z}_{i}+\left(1-{Z}_{i}\right)\frac{{\widehat{e}}_{i}}{1-{\widehat{e}}_{i}}$, where the unexposed are weighted by their odds of being exposed.

When estimating the causal effect, we fit Poisson regressions with a “log” link to estimate incident rate ratios (for incident outcomes) and logistic regressions to estimate odds ratios (for prevalent outcomes) on the full sample. These were weighted using: (1) propensity score weights (PSW) and (2) PSW $\times$ OSW, where PSW were either the IPTW or odds-weights above. We used both unadjusted and multivariable-adjusted weighted regressions, incorporating the sampling design using the “survey” package in R, to estimate the marginal and conditional causal effects, respectively.

Assessment of matching and weighting

Metrics, such as the absolute standardized mean difference (SMD), can be compared before and after implementing weighting or matching methods to assess improvement in balance of covariates across the exposure groups^10,18. We define the absolute SMD of a covariate as $\frac{\left|{\overline{x}}_{\mathrm{exposed}}-{\overline{x}}_{\mathrm{unexposed}}\right|}{{s}_{\mathrm{exposed}}}$, where ${\overline{x}}_{\mathrm{exposed}}$ and ${\overline{x}}_{\mathrm{unexposed}}$ are the means of covariate $x$ in the exposed and unexposed groups, and ${s}_{\mathrm{exposed}}$ is the standard deviation of $x$ in the full exposed group. In other words, the standard deviation ${s}_{\mathrm{exposed}}$ is computed using the full exposed group—before potentially sampling individuals for matching purposes—while accounting for survey design using weighting with OSW¹⁰. We similarly use OSW for weighting when estimating ${\overline{x}}_{\mathrm{exposed}}$ and ${\overline{x}}_{\mathrm{unexposed}}$. For categorical (including ordinal) variables, the absolute SMD for each level of the covariate is calculated, where now the mean of the covariate (at a given level) is the proportion of individuals with that level of the covariate, rather than treating the covariate as continuous^10,21.

Simulation study

Sampling design

We simulated complex health survey data with a nested structure, where the population was segmented into block groups (BGs), with equal-sized households (HHs) nested within the BGs. We used a stratified two-stage probability sampling design to draw 1000 independent samples from this population. This design mimicked the sampling design of the Bronx site in the HCHS/SOL²². Figure 2 provides an overview of the sampling design. The population contained 752 BGs split unevenly across 8 strata. We assigned the BGs strata-specific sampling probabilities. The BG sampling probability was 25% for BGs in strata 1–4 and 60% for BGs in strata 5–8. We sampled entire BGs without replacement from the population based on these strata-specific BG sampling probabilities.

In the primary scenario 1 (Fig. 2), we generated the number of HHs to vary for each BG using an exponential distribution with mean of 450. Within each HH, we generated 2 individuals and their ages, and set the HH sampling probabilities to depend on the maximum age of the HH. First, we sampled a mean age for the HH as $N\left({\mathrm{40,15}}^{2}\right)$, truncated to a range of 23 to 69. Second, we sampled the age of the first individual and second individual from a uniform, discrete distribution that ranged within 10 years of the mean age. For each HH, the HH sampling probability was calculated as ${\text{expit}}\left(-8+0.1\times \text{max\_HH\_age}\right)$, where ${\text{expit}}\left(x\right)=\frac{\mathrm{exp}\left(x\right)}{1+\mathrm{exp}\left(x\right)}$. From the BGs that were selected in stage 1, we sampled equal-sized HHs without replacement based on these HH sampling probabilities. In a secondary scenario 2, we did not use age in the sampling design (Supplementary Fig. 1).

We calculated survey weights for each sample in three steps. We let $i$ designate the BG, $j$ designate the HH and $k$ designate the individual. First, we calculated the individual sampling probability as ${{p}_{ijk}=p}_{i}{p}_{ij}$, where ${p}_{i}$ is the BG sampling probability and ${p}_{ij}$ is the HH sampling probability. Second, we calculated the base weights as ${w}_{ijk}=\frac{1}{{p}_{ijk}}$. Third, we calculated the final weights to use in our analyses as ${W}_{ijk}=\frac{{w}_{ijk}}{\frac{1}{n}\sum_{i,j,k}{w}_{ijk}}$.

Generating variables and association models

According to the description below, we generated the following variables: BMI and years between visits as predictors; insomnia as the exposure of interest; hypertension status in visits 1 and 2 and MCI in visit 2 as outcomes. In brief, we generated the outcomes for a visit using a potential outcomes framework, i.e. by simulating the outcomes under two (observed and unobserved) exposure values, to allow estimation of both the true marginal and conditional population causal effects.

In detail, in addition to age, we generated two other predictors, baseline BMI and years between visits. BMI and years between visits were generated independently for all individuals using $N\left({\mathrm{29,9}}^{2}\right)$, truncated to the range of 15 to 63, and using $N\left({\mathrm{6,0.5}}^{2}\right)$, truncated to the range of 3 to 9, respectively.

We generated the binary exposure, insomnia, independently for all individuals in two steps. First, we calculated the probability that an individual has insomnia using the following logistic model:

$${\text{logit}}\left(\mathit{Pr}\left({Z}_{i}=1\right)\right)={\alpha }_{0}+{\alpha }_{1}bm{i}_{i}+{\alpha }_{2}ag{e}_{i},$$

where ${\alpha }_{0}=\text{log(}0.109)$, ${\alpha }_{1}=\text{log(}1.025)$ and ${\alpha }_{2}=\text{log(}1.019)$, inferred from the HCHS/SOL data. Second, we used $\mathit{Pr}\left({Z}_{i}=1\right)$ to sample the observed insomnia status, ${Z}_{i}$, from a Bernoulli distribution.

For the binary outcomes, we generated prevalent MCI that was measured at visit 2 only and incident hypertension that was measured at both visit 1 and 2. Both outcomes were generated based on the HCHS/SOL data so that the prevalence of hypertension at each visit was relatively high ($\approx$ 40%), while the prevalence of MCI was low ($\approx$ 8%).

We generated the outcomes for a visit using a potential outcomes framework that consisted of three steps to allow estimation of both the true marginal and conditional population causal effects. For an individual, let ${Y}_{ijk1}$ designate the outcome at visit 1 and ${Y}_{ijk2}$ designate the outcome at visit 2. Let ${h}_{ij}$ be the HH clustering effect generated using $N\left(\mathrm{0,1}\right)$ and ${b}_{i}$ be the BG clustering effect generated using $N\left({\mathrm{0,0.5}}^{2}\right)$. First, for a visit, we calculated the potential probabilities of the outcome under $\mathrm{a}=1$ (insomnia) and $\mathrm{a}=0$ (no insomnia) using logistic regression models.

For prevalent MCI at visit 2, we used the following model:

$${\mathrm{logit}\left(\mathrm{Pr}\left[{Y}_{ijk2}\left(a\right)=1\right]\right)=\beta }_{0}+{\beta }_{1}a+{\beta }_{2}bm{i}_{ijk}+{\beta }_{3}ag{e}_{ijk}+{h}_{ij}+{b}_{i},$$

where ${\beta }_{0}=\mathrm{log}\left(0.003\right)$, ${\beta }_{1}=\mathrm{log}\left(1.560\right)$, ${\beta }_{2}=\mathrm{log}\left(1.018\right)$ and ${\beta }_{3}=\mathrm{log}\left(1.056\right)$, based on the HCHS/SOL data.

For hypertension status at visit 1 and visit 2, we used the following models:

$${\mathrm{logit}\left(\mathrm{Pr}\left[{Y}_{ijk1}\left(a\right)=1\right]\right)=\gamma }_{0}+{\gamma }_{1}a+{\gamma }_{2}bm{i}_{ijk}+{\gamma }_{3}ag{e}_{ijk}+{h}_{ij}+{b}_{i},$$

$${\mathrm{logit}\left(\mathrm{Pr}\left[{Y}_{ijk2}\left(a\right)=1\right]\right)=\phi }_{0}+{\phi }_{1}a+{\phi }_{2}bm{i}_{ijk}+{\phi }_{3}ag{e}_{ijk}+{\phi }_{4}year{s}_{ijk}+{h}_{ij}+{b}_{i},$$

where ${\gamma }_{0}=\mathrm{log}\left(0.002\right)$, ${\gamma }_{1}=\mathrm{log}\left(1.065\right)$, ${\gamma }_{2}=\mathrm{log}\left(1.088\right)$, ${\gamma }_{3}=\mathrm{log}\left(1.082\right)$, ${\phi }_{0}=\mathrm{log}\left(0.001\right)$, ${\phi }_{1}=\mathrm{log}\left(1.247\right)$, ${\phi }_{2}=\mathrm{log}\left(1.082\right)$, ${\phi }_{3}=\mathrm{log}\left(1.092\right)$ and ${\phi }_{4}=\mathrm{log}\left(1.098\right)$, based on the HCHS/SOL data.

Second, we used the respective probabilities to sample ${Y}_{ijk1}\left(\mathrm{a}\right)$ and ${Y}_{ijk2}\left(\mathrm{a}\right)$ from Bernoulli distributions under $a=1$ and $a=0$. Third, we identified the outcomes that were observed under ${Z}_{i}$.

In a sensitivity simulation analysis, we generated a new variable which we named education. We replaced age with education in the data generating models for insomnia, MCI, and hypertension. Education was generated for an individual in two steps, while ensuring that it is correlated with age. First, we drew from $Unif(min\left(age\right), max\left(age\right))$. Then, we drew from a Bernoulli distribution to decide if that value should be replaced with the individual’s age. The Bernoulli probability was chosen such that education would be correlated with age with correlation $\rho \in \{0.25, 0.5, 0.75\}$.

Calculating true causal effects

We estimated the true marginal and conditional causal effects for the population of size $N$ in two steps. First, we created a new data frame with $2N$ observations, in which every individual has an observation for each potential outcome. Second, using the new data frame, we fit multiple regression models, each targeting a separate causal estimand. Specifically, we estimated the ATE and the CATE using the complete new data frame, as well as the ATT and CATT using only the observations where ${Z}_{i}=1$. For prevalent MCI, we fit marginal logistic regressions (regressing MCI on insomnia; estimating ATE and ATT) and conditional logistic regressions (regressing MCI on insomnia, BMI and age; estimating CATE and CATT). For incident hypertension, using a “log” link, we fit marginal Poisson regressions (regressing hypertension on insomnia with log of years between visits included as an offset; estimating ATE and ATT) and conditional Poisson regressions (regressing hypertension on insomnia, BMI and age with the log of years between visits included as an offset; estimating CATE and CATT) on the observations that did not have hypertension at baseline. For both outcomes, we used the exponentiated coefficient estimates on insomnia as the true causal effects.

Performance measures

We used bias and 95% confidence interval (CI) coverage to compare the different approaches to using the survey weights and design on the simulated data. We calculated bias as $\frac{1}{1000}\sum_{i=1}^{1000}({\widehat{T\mathrm{E}}}_{\mathrm{i}}-TE)$ where $1000$ was the number of samples that were drawn from our simulated population, $TE$ was the true causal effect and ${\widehat{TE}}_{i}$ was the estimated causal effect for the $i$th sample. We calculated 95% CI coverage as the percentage of simulated samples with a 95% CI that contained the true causal effect: $100\times \frac{1}{1000}\sum_{i=1}^{1000}I\left(TE\in C{I}_{i}\right)$ where $C{I}_{i}$ was the 95% CI for the $i$th sample. An approach performs well when it has low bias and coverage near 95%.

Sensitivity analyses

We performed three types of sensitivity analyses. One, for both scenarios 1 and 2, we treated age as an unmeasured confounder and re-ran the analyses to assess sensitivity to omission of confounding variables that are correlated with the survey weights. Two, we then further focused on the analysis methods that had good performance in this scenario 1 sensitivity analysis, and generated another confounding variable named (without loss of generality) education, and used it instead of age in the data generating models for insomnia and for the outcomes (MCI and hypertension). We generated this variable so that it is correlated with age with varying degrees of correlation ($\rho \in \{0.25, 0.5, 0.75\}$). In this setting, age was still a design variable. Thus, we assessed the degree to which correlation of an unmeasured confounder with a design variable may help recover the underlying causal effect size. Three, for scenario 1, we re-generated insomnia, MCI and hypertension multiple times by varying the model intercepts and re-ran the analyses to assess sensitivity to changes in the prevalence of the exposure and outcomes. The intercepts were chosen so that the prevalence of the exposure and outcome varied from 5 to 35 in increments of 10.

Results

Tables 2 and 3 and Supplementary Tables 1 and 2 provide the simulation results of the various approaches to incorporating the survey weights and design into the matching and weighting methods, respectively. Under correct specification of the matching and weighting approaches, all approaches, excluding the PSM approaches using ISW, performed well for prevalent MCI and incident hypertension in both scenarios 1 and 2 (without age in the sampling design). When age was omitted from the matching and effect estimation models (i.e. under-specification), most approaches experienced increases in bias and poor coverage. In scenario 2, no approach performed well. However, in scenario 1, methods that used OSW as a covariate in matching or the propensity score calculation, in addition to incorporating OSW during causal effect estimation, continued to perform well.

Table 2 Simulation results for estimating effect of insomnia on prevalent MCI using various matching methods in the two compared scenarios.

Full size table

Table 3 Simulation results for estimating effect of insomnia on incident hypertension using various matching methods in the two compared scenarios.

Full size table

Highlighted in Tables 2 and 3 are four matching approaches identified as robust based on two subjective criteria: (1) coverage between 93 and 97% for scenarios 1 and 2 under correct specification; and (2) coverage between 93 and 97% for scenario 1 during under-specification. The robust PSM methods used propensity score computed via logistic regression with OSW as a covariate for matching, and next fitted regressions weighted using OSW. The robust CEM methods conducted matching using both coarsened covariates and coarsened OSW, following by regressions weighted using CEMW × OSW.

Table 4 provides results from the sensitivity analysis in which a confounder (education) was correlated with one of the survey design variables (age) and compares estimation results with and without including education in the analysis (correct specification and under-specification, respectively), by degree of the correlation between age and education. This sensitivity analysis focuses on the four robust matching methods identified in the first sensitivity analysis above. When education is not incorporated in the analysis, we see that the higher its correlation is with the design variable, the better the robust methods are able to recover the underlying causal effect.

Table 4 Simulation results from the second sensitivity analysis using the four robust matching methods to assess the degree to which correlation of an unmeasured confounder with a design variable may help recover the underlying causal effect size.

Full size table

Figures 3 and 4 provide the results of the sensitivity analysis to assess the effect of changing the exposure and outcome prevalences on the identified robust matching methods. Both bias and coverage appear robust to changes in the exposure and outcome prevalences as long as the prevalences are not rare (i.e. > 5%).

Data analysis

Hispanic community health study/study of latinos

The HCHS/SOL is a community based, multi-center, longitudinal cohort study of Hispanic/Latinos in the US²³. A goal of the study was to investigate causal risk factors of diseases in Hispanic/Latino individuals²³. In 2008, the study recruited over 16,415 men and women, aged 18–74, who self-identified as Hispanic/Latino, from four communities: Bronx, NY; Chicago, IL; Miami, FL and San Diego, CA²³. HCHS/SOL is a complex health survey with a stratified three-stage probability sample²⁴. Investigators used unequal sampling probabilities in each stage, selecting census BGs in stage 1, households in stage 2 and individuals in stage 3, and prioritized sampling of households more likely to have adults ages 45–74²⁴.

The HCHS/SOL was approved by the institutional review boards (IRBs) at each field center, where all participants gave written informed consent in their preferred language (Spanish/English), and by the Non-Biomedical IRB at the University of North Carolina at Chapel Hill, to the HCHS/SOL Data Coordinating Center. All IRBs approving the study are: Non-Biomedical IRB at the University of North Carolina at Chapel Hill. Chapel Hill, NC; Einstein IRB at the Albert Einstein College of Medicine of Yeshiva University. Bronx, NY; IRB at Office for the Protection of Research Subjects (OPRS), University of Illinois at Chicago. Chicago, IL; Human Subject Research Office, University of Miami. Miami, FL; Institutional Review Board of San Diego State University, San Diego, CA. The study reported here was approved by the Mass General Brigham IRB under protocol #2022P001237. All methods were carried out in accordance with relevant guidelines and regulations.

Exposure and predictors

Insomnia was defined using the Women Health Initiative Insomnia Rating Scale (WHIIRS) ≥ 9²⁵. The other included predictors were: time between visits; Hispanic/Latino background (Dominican, Central American, Cuban, Mexican, Puerto Rican, South American, more than one/other heritage); alcohol (never, former, current); smoking (never, former, current); age; gender (female, male); marital status (married or living with partner, single, separated, divorced or widower); education (no high school diploma or GED, at most a high school diploma or GED, greater than high school diploma or GED); BMI; employment (retired and not currently employed or missing on employment, not retired or missing on retirement and not currently employed, employed part-time, < 35 h/week, employed full-time, > 35 h/week). Table 5 provides a summary of the predictors stratified by insomnia status.

Table 5 Demographics and BMI of HCHS/SOL stratified by insomnia status.

Full size table

Outcomes

Outcomes of interest are incident hypertension, an average of 6 years after the baseline exam, and prevalent MCI, an average of 7 years after the baseline exam. Hypertension (≥ Stage 1) was operationalized as systolic blood pressure ≥ 130 mmHg, DBP ≥ 80 mmHg or use of antihypertensive medications. MCI was according to the National Institute on Aging-Alzheimer’s Association criteria and included individuals with severe impairment/suspect dementia²⁶.

Analyses

For each outcome, we removed any individuals with missing values on the predictors or outcome (at baseline or visit 2). For incident hypertension, we additionally removed individuals with hypertension at baseline. Our final samples sizes for the prevalent MCI and incident hypertension samples are 6,086 and 6,097, respectively. We applied all the weighting and matching-based causal inference approaches to both samples.

Results

Supplementary Tables 3 and 4 provide the HCHS/SOL analysis results across all weighting and matching-based causal inference approaches, while Table 6 highlights the results among the robust matching methods only. Comparing individuals with and without insomnia, Table 6 provides the estimated odds ratios for prevalent MCI seven years after, on average, and the estimated incident rate ratios for incident hypertension an average of 6-years after baseline assessment. Based on the robust PSM method, insomnia has a causal effect on both MCI (marginal OR 1.402, CI [1.095, 1.794]; conditional OR 1.432, CI [1.108, 1.850]) and hypertension (marginal IRR 1.184, CI [1.002, 1.400]; conditional IRR 1.174, CI [1.012, 1.360]). Figure 5 provides a plot of the absolute SMD before and after implementing the robust PSM method for each outcome. The robust PSM method does appear to induce better balance in the covariates. Unlike in the simulations, the estimates from the CEM methods diverge substantially and have wide CIs, compared to the estimates from the PSM and weighting methods. This is due to the small number of individuals who were ultimately used in the analysis after conducting CEM.

Table 6 HCHS/SOL data analysis results for both prevalent MCI and incident hypertension across the robust matching-based causal inference approaches.

Full size table

Discussion

Motivated by our interest in applying matching and weighting-based causal inference methods to complex health survey data, we conducted a simulation study to compare various approaches to incorporating the survey weights and design into these methods. We found that most weighting and matching methods performed well under correct specification. However, when a variable (age, in our simulations) was treated as an unmeasured confounder and not included in the matching and effect estimation models (i.e., under-specification) and the survey weights were constructed to depend on this variable, only the matching methods that used the survey weights in both the causal estimation and as a covariate in the matching step continued to perform well. Although age was specifically modelled in simulating the survey weights, our analysis was motivated by the potential for unmeasured variables that are related to demographic or socioeconomic status. The HCHS/SOL survey sampling design accounted for socioeconomic status, yet not all potential sociocultural variables were measured. Thus, it is plausible that an unmeasured variable influenced the sampling process that is nonetheless captured to some extent by the survey weights. As another assessment, we also considered a confounding variable (education in our simulations) that is associated with a design variable (age in simulations). When education was treated as an unmeasured confounder, we saw that the higher its correlation is with the design variable, the better the performance of the robust methods in estimating the causal effects (however confounding bias remains due to imperfect correlation between the unmeasured confounding with the design variable). Therefore, the simulation results suggest that incorporating the survey weights as a covariate in the matching may provide some protection against unmeasured confounding. We recommend further that investigators subsequently incorporate the survey weights in causal effect estimation.

Previous studies have agreed that survey weights should be incorporated in the causal effect estimation step but have disagreed on whether and how to incorporate the survey weights in the matching step. Ridgeway et al. recommended using a survey-weighted propensity score model, while Dugoff et al. concluded that survey weights should be included as a covariate in the propensity score model instead, aligning with our recommendation^11,13. In contrast, Austin et al. and Lenis et al. found that whether and how the survey weights were incorporated in matching did not impact performance of the method^12,15. Our study is an important contribution to existing literature. First, while previous studies have focused on continuous outcomes, our study focuses on binary outcomes, targeting both prevalent and incident population estimates of the OR and IRR, respectively. Second, our study is the first to consider the use of CEM in the context of complex survey data. Third, while other studies have used simple sampling designs that are not often employed in practice, our study uses a more complex sampling design and is the first to allow the survey weights to depend on a confounder. Fourth, our study assesses both sensitivity to the introduction of unmeasured confounding and to changes in the exposure and outcome prevalences.

When applying our robust PSM methods that consistently performed well in the simulation study to the HCHS/SOL data, we found that insomnia has a causal association with both prevalent MCI 7 years later and with incident hypertension 6 years later in the US Hispanic/Latino population. Our incident hypertension results support those reported by Li et al.²⁷ who estimated the odds ratio for incident hypertension comparing individuals with and without insomnia via logistic regression. In addition, we also provide new evidence of an association between insomnia and prevalent MCI in US Hispanic/Latino adults. We also found that our robust CEM methods performed poorly when applied to the HCHS/SOL data, despite consistently performing well in the simulation study, because of the huge reductions in sample size incurred from matching on a large number of strata. This suggests that CEM may not be practical for small/medium sample sizes and when there are many variables to match on.

Recent sleep research has prioritized using Mendelian Randomization (MR) to conduct causal inference for sleep exposures on downstream health outcomes^{28,29,30,31,32,33,34,35,36,37,38,39,40} using genetic variants as instruments for modifiable exposures¹. However, MR has limitations that have been overshadowed in the wake of its popularity. Violations of MR’s assumptions—relevance, exchangeability, exclusion restriction and homogeneous and linear associations—can result from issues, such as residual pleiotropy, population stratification, linkage disequilibrium, weak IVs and heterogeneity^1,41. Additionally, lack of relevant genetic variants for the exposure may reduce power for finding causal associations⁵. Specific exposures used by MR studies are also restricted by the specific measures targeted by genome-wide association studies (GWAS) performed. Lastly, most MR studies conducted so far on sleep exposures have used genetic information from predominately European populations, minimizing their generalizability to racial and ethnic minority groups⁵. These limitations of MR underscore the importance of triangulating causal inference from multiple methods currently underutilized in sleep research.

Although we performed an extensive simulation study, there is still room for further investigation in applying causal inference methods to complex health survey data. Future work may focus on––but is not limited to––identifying the best approaches to incorporating the survey weights and design when assessing matching, evaluating robustness of the matching methods after introduction of different types of missingness, assessing the effectiveness of other propensity score estimation approaches and matching algorithms, studying the effect of over-specification of the propensity score and the causal effect estimation models by including unnecessary variables on inference, and investigating other causal inference methods that are not based on weighting or matching.

Data availability

HCHS/SOL data are available on the National Heart Lung and Blood Institute’s BioLINCC (Biologic Specimen and Data Repository Information Coordinating Center) repository under accession number HLB01141422a. Alternatively, the data can also be obtained via a data use agreement with the HCHS/SOL Data Coordinating Center at the University of North Carolina at Chapel Hill, see collaborators website: https://sites.cscc.unc.edu/hchs/.

Code availability

Code used for simulations and data analysis is publicly available on the GitHub repository: https://github.com/anjashahu/causal_matching_paper.

Abbreviations

ATE:: Average treatment effect
ATT:: Average treatment effect for the treated
BG:: Block group
CATE:: Conditional average treatment effect
CATT:: Conditional average treatment effect for the treated
CEM:: Coarsened exact matching
CEMW:: Coarsened exact matching weights
CI:: Confidence interval
Cover:: Coverage
HCHS/SOL:: Hispanic Community Health Study/Study of Latinos
HH:: Household
IPTW:: Inverse probability of treatment weighting
IRR:: Incidence rate ratio
ISW:: Inherited survey weights
MCI:: Mild cognitive impairment
MR:: Mendelian Randomization
Obs:: Observations
OR:: Odds ratio
OSW:: Original survey weights
PS:: Propensity score
PSM:: Propensity score matching
PSW:: Propensity score weighting
RCT:: Randomized controlled trial
SE:: Standard error
SMD:: Standardized mean difference
SUTVA:: Stable unit treatment value assumption

References

Lawlor, D. A., Harbord, R. M., Sterne, J. A. C., Timpson, N. & Davey, S. G. Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology. Stat. Med. 27(8), 1133–1163 (2008).
Article MathSciNet PubMed Google Scholar
Rochon, P. A. et al. The inclusion of minority groups in clinical trials: Problems of under representation and under reporting of data. Account Res. 11(3–4), 215–223 (2004).
Article PubMed Google Scholar
Faraoni, D. & Schaefer, S. T. Randomized controlled trials vs observational studies: Why not just live together? BMC Anesthesiol. 16(1), 102 (2016).
Article PubMed PubMed Central Google Scholar
Pack, A. I. et al. Randomized clinical trials of cardiovascular disease in obstructive sleep apnea: Understanding and overcoming bias. Sleep 44(2), 229 (2021).
Article Google Scholar
Sofer, T., Goodman, M. O., Bertisch, S. M. & Redline, S. Longer sleep improves cardiovascular outcomes: Time to make sleep a priority. Eur. Heart J. 42(34), 3358–3360 (2021).
Article PubMed PubMed Central Google Scholar
Munafò, M. R. & Davey, S. G. Robust research needs many lines of evidence. Nature 553(7689), 399–401 (2018).
Article ADS PubMed Google Scholar
Smart, A. & Harrison, E. The under-representation of minority ethnic groups in UK medical research. Ethn. Health 22(1), 65–82 (2017).
Article PubMed Google Scholar
McGrath, R. P. et al. The burden of health conditions across race and ethnicity for aging Americans: Disability-adjusted life years. Medicine 98(46), e17964 (2019).
Article PubMed PubMed Central Google Scholar
Lohr, S. Sampling: Design and Analysis 2nd edn. (CRC Press, 2010).
MATH Google Scholar
Stuart, E. A. Matching methods for causal inference: A review and a look forward. Stat. Sci. 25(1), 1–21 (2010).
Article MathSciNet PubMed PubMed Central MATH Google Scholar
Dugoff, E. H., Schuler, M. & Stuart, E. A. Generalizing observational study results: Applying propensity score methods to complex surveys. Health Serv. Res. 49(1), 284–303 (2014).
Article PubMed Google Scholar
Austin, P. C., Jembere, N. & Chiu, M. Propensity score matching and complex surveys. Stat. Methods Med. Res. 27(4), 1240–1257 (2018).
Article MathSciNet PubMed Google Scholar
Ridgeway, G., Kovalchik, S. A., Griffin, B. A. & Kabeto, M. U. Propensity score analysis with survey weighted data. J. Causal Inference 3(2), 237–249 (2015).
Article MathSciNet PubMed PubMed Central Google Scholar
Lenis, D., Ackerman, B. & Stuart, E. A. Measuring model misspecification: Application to propensity score methods with complex survey data. Comput. Stat. Data Anal. 128, 48–57 (2018).
Article MathSciNet PubMed PubMed Central MATH Google Scholar
Lenis, D., Nguyen, T. Q., Dong, N. & Stuart, E. A. It’s all about balance: Propensity score matching in the context of complex survey data. Biostatistics 20(1), 147–163 (2019).
Article MathSciNet PubMed Google Scholar
Imbens, G. W. Nonparametric estimation of average treatment effects under exogeneity: A review. Rev. Econ. Stat. 86(1), 4–29 (2004).
Article MathSciNet Google Scholar
Hernán, M. A. A definition of causal effect for epidemiological research. J. Epidemiol. Community Health 58(4), 265–271 (2004).
Article PubMed PubMed Central Google Scholar
Austin, P. C. & Stuart, E. A. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat. Med. 34(28), 3661–3679 (2015).
Article MathSciNet PubMed PubMed Central Google Scholar
Iacus, S. M., King, G. & Porro, G. cem: Software for coarsened exact matching. J. Stat. Softw. 30, 9 (2009).
Article Google Scholar
King, G. An Explanation for CEM Weights. https://docs.google.com/document/d/1xQwyLt_6EXdNpA685LjmhjO20y5pZDZYwe2qeNoI5dE/edit (2012) (Accessed 3 July 2021).
Harder, V. S., Stuart, E. A. & Anthony, J. C. Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychol. Methods 15(3), 234–249 (2010).
Article PubMed PubMed Central Google Scholar
Cai, J. et al. Comparisons of Statistical Methods for Handling Attrition in a Follow-up Visit with Complex Survey Sampling. Stat. in Med. 42(11), 1641–1668 (2023).
Article MathSciNet Google Scholar
Sorlie, P. D. et al. Design and implementation of the Hispanic Community Health Study/Study of Latinos. Ann. Epidemiol. 20(8), 629–641 (2010).
Article PubMed PubMed Central Google Scholar
Lavange, L. M. et al. Sample design and cohort selection in the Hispanic Community Health Study/Study of Latinos. Ann. Epidemiol. 20(8), 642–649 (2010).
Article PubMed PubMed Central Google Scholar
Levine, D. W. et al. Reliability and validity of the Women’s health initiative insomnia rating scale. Psychol. Assess. 15(2), 137–148 (2003).
Article PubMed Google Scholar
González, H. M. et al. A research framework for cognitive aging and Alzheimer’s disease among diverse US Latinos: Design and implementation of the Hispanic Community Health Study/Study of Latinos-Investigation of Neurocognitive Aging (SOL-INCA). Alzheimers Dement. 15(12), 1624–1632 (2019).
Article PubMed PubMed Central Google Scholar
Li, X. et al. Associations of sleep-disordered breathing and insomnia with incident hypertension and diabetes. The Hispanic Community Health Study/Study of Latinos. Am. J. Respir. Crit. Care Med. 203(3), 356–365 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ai, S. et al. Causal associations of short and long sleep durations with 12 cardiovascular diseases: Linear and nonlinear Mendelian randomization analyses in UK Biobank. Eur. Heart J. 42(34), 3349–3357 (2021).
Article PubMed Google Scholar
Liao, L.-Z. et al. Causal assessment of sleep on coronary heart disease. Sleep Med. 67, 232–236 (2020).
Article PubMed Google Scholar
van Oort, S., Beulens, J. W. J., van Ballegooijen, A. J., Handoko, M. L. & Larsson, S. C. Modifiable lifestyle factors and heart failure: A Mendelian randomization study. Am. Heart J. 227, 64–73 (2020).
Article PubMed Google Scholar
Zhuang, Z. et al. Association of physical activity, sedentary behaviours and sleep duration with cardiovascular diseases and lipid profiles: A Mendelian randomization analysis. Lipids Health Dis. 19(1), 86 (2020).
Article CAS PubMed PubMed Central Google Scholar
Daghlas, I. et al. Sleep duration and myocardial infarction. J. Am. Coll. Cardiol. 74(10), 1304–1314 (2019).
Article CAS PubMed PubMed Central Google Scholar
Richmond, R. C. et al. Investigating causal relations between sleep traits and risk of breast cancer in women: Mendelian randomisation study. BMJ 365, l2327 (2019).
Article PubMed PubMed Central Google Scholar
Titova, O. E. et al. Sleep duration and risk of overall and 22 site-specific cancers: A Mendelian randomization study. Int. J. Cancer 148(4), 914–920 (2021).
Article CAS PubMed Google Scholar
Gao, X.-L. et al. Obstructive sleep apnea syndrome and causal relationship with female breast cancer: A Mendelian randomization study. Aging (Albany, NY) 12(5), 4082–4092 (2020).
Article PubMed Google Scholar
Henry, A. et al. The relationship between sleep duration, cognition and dementia: A Mendelian randomization study. Int. J. Epidemiol. 48(3), 849–860 (2019).
Article PubMed PubMed Central Google Scholar
Anderson, E. L. et al. Is disrupted sleep a risk factor for Alzheimer’s disease? Evidence from a two-sample Mendelian randomization analysis. Int. J. Epidemiol. 50, 817 (2020).
Article PubMed Central Google Scholar
Gao, X. et al. Investigating causal relations between sleep-related traits and risk of type 2 diabetes mellitus: A Mendelian randomization study. Front. Genet. 11, 607865 (2020).
Article PubMed PubMed Central Google Scholar
Dashti, H. S. et al. Genetic determinants of daytime nap** and effects on cardiometabolic health. Nat. Commun. 12(1), 900 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Daghlas, I. et al. Habitual sleep disturbances and migraine: A Mendelian randomization study. Ann. Clin. Transl. Neurol. 7(12), 2370–2380 (2020).
Article PubMed PubMed Central Google Scholar
Burgess, S. et al. Guidelines for performing Mendelian randomization investigations. Wellcome Open Res. 8(4), 186 (2020).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Institute on Aging (R01AG048642, RF1AG054548, RF1AG061022, R21AG070644, and R21AG056952) and by the National Heart, Lung, and Blood Institute (R35HL135818, R01HL161012). Dr. González also receives additional support from P30AG062429 and P30AG059299. The Hispanic Community Health Study/Study of Latinos is a collaborative study supported by contracts from the National Heart, Lung, and Blood Institute (NHLBI) to the University of North Carolina (HHSN268201300001I/N01-HC-65233), University of Miami (HHSN268201300004I/N01-HC-65234), Albert Einstein College of Medicine (HHSN268201300002I/N01-HC-65235), University of Illinois at Chicago (HHSN268201300003I/N01-HC-65236 Northwestern Univ), and San Diego State University (HHSN268201300005I/N01-HC-65237). The following Institutes/Centers/Offices have contributed to the HCHS/SOL through a transfer of funds to the NHLBI: National Institute on Minority Health and Health Disparities, National Institute on Deafness and Other Communication Disorders, National Institute of Dental and Craniofacial Research, National Institute of Diabetes and Digestive and Kidney Diseases, National Institute of Neurological Disorders and Stroke, NIH Institution-Office of Dietary Supplements. The authors thank the staff and participants of HCHS/SOL for their important contributions.

Author information

Authors and Affiliations

Department of Biostatistics, Harvard T.H. Chan of Public Health, Boston, MA, USA
Anja Shahu & Tamar Sofer
Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women’s Hospital, 221 Longwood Avenue, Boston, MA, 02115, USA
Anja Shahu, Joon Chung, Susan Redline & Tamar Sofer
Institute of Gerontology, Wayne State University, Detroit, MI, USA
Wassim Tarraf
Department of Neurology, University of Miami Miller School of Medicine, Miami, FL, USA
Alberto R. Ramos
Department of Neurosciences and Shiley-Marcos Alzheimer’s Disease Center, University of California, San Diego, La Jolla, CA, USA
Hector M. González
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jianwen Cai
CardioVascular Institute (CVI), Beth Israel Deaconness Medical Center, Boston, MA, USA
Tamar Sofer

Authors

Anja Shahu
View author publications
You can also search for this author in PubMed Google Scholar
Joon Chung
View author publications
You can also search for this author in PubMed Google Scholar
Wassim Tarraf
View author publications
You can also search for this author in PubMed Google Scholar
Alberto R. Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Hector M. González
View author publications
You can also search for this author in PubMed Google Scholar
Susan Redline
View author publications
You can also search for this author in PubMed Google Scholar
Jianwen Cai
View author publications
You can also search for this author in PubMed Google Scholar
Tamar Sofer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.S., J.C., and T.S. conceptualized the manuscript. A.S. performed all analyses, prepared tables and figures. A.S., J.C., and T.S. drafted the manuscript. All authors critically reviewed the manuscript.

Corresponding author

Correspondence to Tamar Sofer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shahu, A., Chung, J., Tarraf, W. et al. Method comparison and estimation of causal effects of insomnia on health outcomes in a survey sampled population. Sci Rep 13, 9831 (2023). https://doi.org/10.1038/s41598-023-36927-2

Download citation

Received: 17 November 2022
Accepted: 12 June 2023
Published: 17 June 2023
DOI: https://doi.org/10.1038/s41598-023-36927-2
Springer Nature Limited

Method comparison and estimation of causal effects of insomnia on health outcomes in a survey sampled population

Abstract

Similar content being viewed by others

A Tutorial on Applying the Difference-in-Differences Method to Health Data

Sample size and power calculations for causal mediation analysis: A Tutorial and Shiny App

Fixed and random effects models: making an informed choice

Introduction

Potential outcomes framework and causal estimands

Implementation of causal inference methods in a survey study

Matching methods

Matching exposed and unexposed

Estimating causal effects

Weighting methods

Assessment of matching and weighting

Simulation study

Sampling design

Generating variables and association models

Calculating true causal effects

Performance measures

Sensitivity analyses

Results

Data analysis

Hispanic community health study/study of latinos

Exposure and predictors

Outcomes

Analyses

Results

Discussion

Data availability

Code availability

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation