Log in

Nonparametric estimation of population average dose-response curves using entropy balancing weights for continuous exposures

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

Weighted estimators are commonly used for estimating exposure effects in observational settings to establish causal relations. These estimators have a long history of development when the exposure of interest is binary and where the weights are typically functions of an estimated propensity score. Recent developments in optimization-based estimators for constructing weights in binary exposure settings, such as those based on entropy balancing, have shown more promise in estimating treatment effects than those methods that focus on the direct estimation of the propensity score using likelihood-based methods. This paper explores recent developments of entropy balancing methods to continuous exposure settings and the estimation of population dose-response curves using nonparametric estimation combined with entropy balancing weights, focusing on factors that would be important to applied researchers in medical or health services research. The methods developed here are applied to data from a study assessing the effect of non-randomized components of an evidence-based substance use treatment program on emotional and substance use clinical outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability Statement

The development of this manuscript was supported by the Center for Substance Abuse Treatment (CSAT), Substance Abuse and Mental Health Services Administration (SAMHA) [#270-2003-00006 and #270-2007-00004C] using data provided by the following grantees: TI-15413, TI-15415, TI-15421, TI-15433, TI-15438, TI-15446, TI-15447, TI-15458, TI-15461, TI-15466, TI-15467, TI-15469, TI-15475, TI-15478, TI-15479, TI-15481, TI-15483 TI-15485, TI-15486, TI-15489, TI-15511, TI-15514, TI-15524, TI-15527, TI-15545, TI-15562, TI-15577, TI-15584, TI-15586, TI-15670, TI-15671, TI-15672, TI-15674, TI-15677, TI-15678, TI-15682, TI-15686, TI-17589, TI-17604, TI-17605, TI-17638, TI-17646, TI-17648, TI-17673, TI-17702, TI-17719, TI-17728, TI-17742, TI-17744, TI-17751, TI-17755, TI-17761, TI-17763, TI-17765, TI-17769, TI-17775, TI-17779, TI-17786, TI-17788, TI-17812, TI-17817, TI-17821, TI-17825, TI-17830, TI-17831, TI-17847, TI-17864, TI-20759, TI-20781, TI-20798, TI-20806, TI-20827, TI-20828, TI-20847, TI-20852, TI-20865, TI-20870, TI-20910, TI-20946, TI-23174, TI-23186, TI-23188, TI- 23195, TI-23196, TI-23197, TI-23200, TI-23202, TI-23204, TI-23206, TI-23224, TI-23244, TI-23247, TI-23265, TI-23270, TI-23276, TI-23278, TI-23279, TI-23296, TI-23298, TI-23304, TI-23310, TI-23311, TI-23312, TI-23316, TI-23322, TI-23323, TI-23325, TI-23336, TI-23345, TI-23346, TI-23348. The authors thank these agencies, grantees, and their participants for agreeing to share their data to support this secondary analysis. The opinions about these data are those of the authors and do not reflect official positions of the government or individual agencies. Please direct correspondence to Brian Vegetabile, bvegetab@rand.org.

References

  • Austin, P.C., Stuart, E.A.: Estimating the effect of treatment on binary outcomes using full matching on the propensity score. Stat. Methods Med. Res. 26(6), 2505–2525 (2017). https://doi.org/10.1177/0962280215601134

    Article  PubMed  Google Scholar 

  • Cleveland, W.S., Devlin, S.J.: Locally weighted regression: an approach to regression analysis by local fitting. J Am Stat Assoc 83(403), 596–610 (1988)

    Article  Google Scholar 

  • Dennis, M.L., Titus, J.C., White, M.K., Unsicker, J.I., Hodgkins, D.: Global Appraisal of Individual Needs: Administration Guide for the Gain and Related Measures. Chestnut Health Systems, Bloomington, IL (2003)

    Google Scholar 

  • Deville, J.C., Särndal, C.E.: Calibration estimators in survey sampling. J. Am. Stat. Assoc. 87(418), 376–382 (1992)

    Article  Google Scholar 

  • Deville, J.C., Särndal, C.E., Sautory, O.: Generalized raking procedures in survey sampling. J. Am. Stat. Assoc. 88(423), 1013–1020 (1993)

    Article  Google Scholar 

  • Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)

    Book  Google Scholar 

  • Fong, C., Hazlett, C., Imai, K., et al.: Covariate balancing propensity score for a continuous treatment: application to the efficacy of political advertisements. Ann. Appl. Stat. 12(1), 156–177 (2018)

    Article  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer series in statistics, vol. 1. Springer, New York (2001)

    Google Scholar 

  • Godley, S.H., Garner, B.R., Smith, J.E., Meyers, R.J., Godley, M.D.: A large-scale dissemination and implementation model for evidence-based treatment and continuing care. Clinical Psychology Science and Practice 18(1), 67–83 (2011). https://doi.org/10.1111/j.1468-2850.2011.01236.x

    Article  PubMed  Google Scholar 

  • Godley, S.H., Smith, J,E., Meyers, R.J., Godley, M.D.: The Adolescent Community Reinforcement Approach: A Clinical Guide for Treating Sustance Use Disorders. Chestnut Health Systems (2016)

  • Grant, S., Hunter, S.B., Pedersen, E.R., Griffin, B.A.: Practical factors determining adolescent substance use treatment settings: results from four online stakeholder panels. J. Subst. Abuse Treat. 109, 34–40 (2020)

    Article  Google Scholar 

  • Griffin, B.A., Ramchand, R., Edelen, M.O., McCaffrey, D.F., Morral, A.R.: Associations between abstinence in adolescence and economic and educational outcomes seven years later among high-risk youth. Drug Alcohol Depend. 113(2–3), 118–124 (2011)

    Article  Google Scholar 

  • Griffin, B.A., McCaffrey, D.F., Ramchand, R., Hunter, S.B., Booth, M.S.: Assessing the sensitivity of treatment effect estimates to differential follow-up rates: implications for translational research. Health Serv. Outcomes Res. Method. 12(2–3), 84–103 (2012)

    Article  Google Scholar 

  • Griffin, B.A., Ramchand, R., Almirall, D., Slaughter, M.E., Burgette, L.F., McCaffery, D.F.: Estimating the causal effects of cumulative treatment episodes for adolescents using marginal structural models and inverse probability of treatment weighting. Drug Alcohol Depend. 136, 69–78 (2014)

    Article  Google Scholar 

  • Griffin, B.A., Ayer, L., Pane, J., Vegetabile, B.G., Burgette, L., McCaffrey, D., Coffman, D.L., Cefalu, M., Funk, R., Godley, M.: Expanding outcomes when considering the relative effectiveness of two evidence-based outpatient treatment programs for adolescents. J. Subst. Abuse Treatment (2020). https://doi.org/10.1016/j.jsat.2020.108075

  • Haberman, S.J.: Adjustment by minimum discriminant information. Ann. Stat. 1, 971–988 (1984)

    Article  Google Scholar 

  • Hainmueller, J.: Entropy balancing for causal effects: a multivariate reweighting method to produce balanced samples in observational studies. Polit. Anal. 20(1), 25–46 (2012)

    Article  Google Scholar 

  • Hirano, K., Imbens, G.W.: The propensity score with continuous treatments. Appl. Bayes. Model. Causal Inference Incomplete-Data Perspect 226164, 73–84 (2004)

    Google Scholar 

  • Holland, P.W.: Statistics and causal inference. J. Am. Stat. Assoc. 81(396), 945–960 (1986)

    Article  Google Scholar 

  • Imai, K., van Dyk, D.A.: Causal inference with general treatment regimes. J. Am. Stat. Assoc. 99(467), 854–866 (2004). https://doi.org/10.1198/016214504000001187

    Article  Google Scholar 

  • Imai, K., Ratkovic, M.: Covariate balancing propensity score. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 76(1), 243–263 (2014). https://doi.org/10.1111/rssb.12027

    Article  Google Scholar 

  • Imbens, G.W.: The role of the propensity score in estimating dose-response functions. Biometrika 87(3), 706–710 (2000)

    Article  Google Scholar 

  • Imbens, G.W., Rubin, D.B. (2015) Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9781139025751

  • Kallus, N., Santacatterina, M.: Kernel optimal orthogonality weighting: A balancing approach to estimating effects of continuous treatments. ar**v preprint ar**v:191011972 (2019)

  • Kang, J.D., Schafer, J.L.: Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci. 22(4), 523–539 (2007)

    Article  Google Scholar 

  • Kennedy, E.H., Ma, Z., McHugh, M.D., Small, D.S.: Non-parametric methods for doubly robust estimation of continuous treatment effects. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 79(4), 1229–1245 (2017). https://doi.org/10.1111/rssb.12212

    Article  Google Scholar 

  • Kish, L.: Survey Sampling. Wiley, New York (1965)

    Google Scholar 

  • Li, F., Morgan, K.L., Zaslavsky, A.M.: Balancing covariates via propensity score weighting. J. Am. Stat. Assoc. 113(521), 390–400 (2018)

    Article  CAS  Google Scholar 

  • Neyman, J., Dabrowska, D.M., Speed, T.P.: On the application of probability theory to agricultural experiments. Essay on Principles 5(4), 465–472 (1923). Section 9. Reprinted in Statistical Science [1990]

    Google Scholar 

  • Ramchand, R., Griffin, B.A., Suttorp, M., Harris, K.M., Morral, A.: Using a cross-study design to assess the efficacy of motivational enhancement therapy-cognitive behavioral therapy 5 (met/cbt5) in treating adolescents with cannabis-related disorders. J. Stud. Alcohol Drugs 72(3), 380–389 (2011)

    Article  Google Scholar 

  • Ramchand, R., Griffin, B.A., Slaughter, M.E., Almirall, D., McCaffrey, D.F.: Do improvements in substance use and mental health symptoms during treatment translate to long-term outcomes in the opposite domain? J. Subst. Abuse Treat. 47(5), 339–346 (2014)

    Article  Google Scholar 

  • Ramchand, R., Griffin, B.A., Hunter, S.B., Booth, M.S., McCaffrey, D.F.: Provision of mental health services as a quality indicator for adolescent substance abuse treatment facilities. Psychiatric Serv. 66(1), 41–48 (2015)

    Article  Google Scholar 

  • Robbins, M.W., Saunders, J., Kilmer, B.: A framework for synthetic control methods with high-dimensional, micro-level data: evaluating a neighborhood-specific crime intervention. J. Am. Stat. Assoc. 112(517), 109–126 (2017)

    Article  CAS  Google Scholar 

  • Robbins, M.W., Griffin, B.A., Shih, R.A., Slaughter, M.E.: Robust estimation of the causal effect of time-varying neighborhood factors on health outcomes. Stat. Med. 39(5), 544–561 (2020)

    Article  Google Scholar 

  • Robins, J.M., Hernan, M.A., Brumback, B.: Marginal structural models and causal inference in epidemiology. Epidemiology 11(5), 550–560 (2000)

    Article  CAS  Google Scholar 

  • Rosenbaum, P.R.: Design of Observational Studies: Springer Series in Statistics. Springer, New York (2010)

    Book  Google Scholar 

  • Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983). https://doi.org/10.1093/biomet/70.1.41

    Article  Google Scholar 

  • Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688 (1974)

    Article  Google Scholar 

  • Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Oxford (2001)

    Google Scholar 

  • Schuler, M.S., Griffin, B.A., Ramchand, R., Almirall, D., McCaffrey, D.F.: Effectiveness of treatment for adolescent substance use: is biological drug testing sufficient? J. Stud. Alcohol Drugs 75(2), 358–370 (2014)

    Article  Google Scholar 

  • Tübbicke, S.: Entropy balancing for continuous treatments. ar**v preprint ar**v:200106281 (2020)

  • Vegetabile, B.G., Gillen, D.L., Stern, H.S.: Optimally balanced gaussian process propensity scores for estimating treatment effects. J. R. Stat. Soc.: Ser. A (Stat. Soc.) 183(1), 355–377 (2020). https://doi.org/10.1111/rssa.12502

    Article  Google Scholar 

  • Yiu, S., Su, L.: Covariate association eliminating weights: a unified weighting framework for causal effect estimation. Biometrika 105(3), 709–722 (2018). https://doi.org/10.1093/biomet/asy015

    Article  PubMed  PubMed Central  Google Scholar 

  • Zhao, Q., Percival, D.: Entropy balancing is doubly robust. J. Causal Inference 5(1), 1 (2017)

    CAS  Google Scholar 

  • Zhu, Y., Coffman, D.L., Ghosh, D.: A boosting algorithm for estimating generalized propensity scores with continuous treatments. J. Causal Inference 3(1), 25–40 (2015)

    Article  Google Scholar 

  • Zubizarreta, J.R.: Stable weights that balance covariates for estimation with incomplete outcome data. J. Am. Stat. Assoc. 110(511), 910–922 (2015)

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors would like to thank many researchers that helped us to think critically about these ideas. Specifically, Lane Burgette, Megan Schuler, and Joseph Pane for their advice and support in finishing this draft. Additionally, we would like to thank our collaborative partners who have provided insights during our understanding of the applied example, specifically, Mark Godley, Lynsay Ayer, and Rod Funk. Finally, we would like to thank the editors and reviewers for their helpful comments that have served to strengthen this manuscript.

Funding

Research reported in this manuscript was supported by the National Institute on Drug Abuse of the National Institutes of Health under award number R01DA045049 (Vegetabile, Griffin, Coffman, Cefalu, McCaffrey) and the National Institute on Aging award number R21AG058123 (Robbins). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brian G. Vegetabile.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Understanding weights and optimal covariate balance

Optimal balancing algorithms attempt to set their constraints by exploiting observable characteristics of weighted distributions and choosing a subset of these characteristics as constraints to target in their algorithms (as it may be necessary to balance an infinite number of moments to obtain weights that represent the target distribution exactly). In this appendix, we motivate the balancing constraints of these algorithms by demonstrating the effect that the weights defined earlier have on the observed distributions under weighting.

Recall the notation and setting of Sect. 2.1: a set of random variables are observed such that Y is an outcome variable, A is an exposure variable, and \(\varvec{X}\) is a vector of covariates that are responsible for assigning the exposure variable; both exposure and covariates are believed to be related to the outcome Y. In the Neyman/Rubin potential outcome framework one estimation strategy attempts to find a set of weights w such that we can estimate the expectation of the potential outcome in the population represented by the distribution of \(\varvec{X}\).

$$\begin{aligned} E_{\varvec{X}|A}[w Y(a) | A = a]&= E_{\varvec{X}}[Y(a)] \end{aligned}$$

It has been shown in many places (e.g., Zhu et al. (2015) and Li et al. (2018)) that one set of weights that identify the above relationship when there are non-trivial relationships among \((Y,A,\varvec{X})\) were given in Equation 1 and are of the form,

$$\begin{aligned} w \equiv w(a, \varvec{x}) = \frac{p_{A}(a)}{p_{A|\varvec{X}}(a|\varvec{x})} = \frac{p_{A}(a) }{p_{A|\varvec{X}}(a|\varvec{x})} \frac{p_{\varvec{X}}(\varvec{x})}{p_{\varvec{X}}(\varvec{x})}= \frac{p_{A}(a) p_{\varvec{X}}(\varvec{x})}{p_{A,\varvec{X}}(a,\varvec{x})} \end{aligned}$$

These weights imply certain observable characteristics in the weighted joint distribution of the treatment and covariates and provide a rationale for why an analyst should check “covariate balance” in an applied analysis. In particular, it follows from the expression above that when the weights \(w(a, \varvec{x})\) are multiplied by the original joint distribution \(p_{A,\varvec{X}}(a,\varvec{x})\) (under appropriate assumptions) that

$$\begin{aligned} w(a, \varvec{x}) p_{A,\varvec{X}}(a,\varvec{x})&= p_{A}(a) p_{\varvec{X}}(\varvec{x}) \end{aligned}$$

implying that the weights provide independence between A and \(\varvec{X}\) in the weighted distribution.

A consequence of this identity is that for general functions \(h(\cdot )\) and \(g(\cdot )\), the weights yield:

$$\begin{aligned} E [ w(a, \varvec{X}) h(A) g(\varvec{X})]&= \int _{{\mathcal {A}}} \int _{{\mathcal {X}}} w(a, \varvec{x}) h(a) g(\varvec{x}) p_{A,\varvec{X}}(a,\varvec{x}) \partial a \partial \varvec{x} \\&= \int _{{\mathcal {A}}} \int _{{\mathcal {X}}} h(a) g(\varvec{x}) p_{A}(a) p_{\varvec{X}}(\varvec{x}) \partial a \partial \varvec{x} \\&= \int _{{\mathcal {A}}} h(a) p_{A}(a) d a \int _{{\mathcal {X}}} g(\varvec{x}) p_{\varvec{X}}(\varvec{x}) d \varvec{x} \\&= E[h(A)] E[g(\varvec{X})]. \end{aligned}$$

This expression yields the equations in (3) and (4) in Sect. 2.2. Specifically, to derive the expressions in (3), note that setting \(h(a) = 1\) with \(g(x) = x^p\) establishes \(E[wX_j^p]=\mu _j^p\). The facts that \(E[wA^q]=\mu _A^q\) and \(E[w]=1\) are established similarly. Likewise, setting \(g(x) = x^p\) and \(h(a)=a^q\) yields \(E[wX_j^pA^q]=\mu _j^p\mu _A^q\). This, in combination with the earlier identities, is used to established the fact that weighting eliminates covariances (including those of higher order) between \(\varvec{X}\) and A as seen in (4). We have shown that weighting effectively preserves the (marginal) moment generating functions of both \(\varvec{X}\) and A, and as such the marginal distributions are also maintained following weighting.

As stated at the outset, optimal balancing algorithms attempt to set their constraints by exploiting these observable characteristics in the weighted distributions and choosing a subset of constraints to target in an analysis.

Appendix 2: Derivation of optimization procedure

Consider the optimization procedure as it was defined in Sect. 2.2:

$$\begin{aligned} \begin{array}{rll} \displaystyle \min _{w} H(w,\nu ) &= \displaystyle \sum _{i =1}^N w_i \log \left( \frac{w_i}{\nu _i} \right) & \hbox {subject to} \\ 1) &\displaystyle \sum _{i =1}^N w_i (X_{ij}^p - {\hat{\mu }}_{j}^p)(A_i^q - {\hat{\mu }}_A^q) = 0 &\hbox { for each }j\hbox { and specified }p=1,2,\dots \hbox { and }q=1,2,\dots \\ 2) &\displaystyle \sum _{i =1}^N w_i (A_i^r - {\hat{\mu }}_A^r) = 0 &\hbox { for specified }r=1,2,\dots \\ 3) &\displaystyle \sum _{i =1}^N w_i (X_{ij}^s - {\hat{\mu }}_{j}^s) = 0 &\hbox { for each }j\hbox { and specified }s = 1,2,\dots \\ 4) &\displaystyle \sum _{i =1}^N w_i = 1, \qquad w_i \ge 0 \end{array} \end{aligned}$$

Here we provide a general derivation of the equations of that section.

It is possible to reformulate the above by first defining \(\varvec{Z_i} = (A_i \ \varvec{X}_i)\) and letting \(h_k(\varvec{Z}_i, {\hat{\mu }}_{z})\) be arbitrary functions of \(A_i\) and \(\varvec{X}_i\) and where \({\hat{\mu }}_{z}\) represent a vector of targets for a population of interest, e.g., \({\hat{\mu }}_z\) would be a sample estimator for the parameter \(\mu _z = E_Z[\varvec{Z}]\). For example, one function from the above optimization procedure is, \(h_{k'}(\varvec{Z}_i, {\hat{\mu }}_{z}) = (X_{ij}^p - {\hat{\mu }}_{j}^p)(A_i^q - {\hat{\mu }}_A^q)\). Without loss generality we can rewrite the above optimization as the following,

$$\begin{aligned} \begin{array}{rll} \displaystyle \min _{w} H(w,\nu ) = & \displaystyle \sum _{i =1}^N w_i \log \left( \frac{w_i}{\nu _i} \right) & \quad \hbox { subject to }\\ 1) &\displaystyle \sum _{i =1}^N w_i h_k(\varvec{Z}_i, {\hat{\mu }}_{z}) = 0 & \quad \hbox { for each defined function indexed by }k \\ 2) &\displaystyle \sum _{i =1}^N w_i = 1, \qquad w_i \ge 0 \end{array} \end{aligned}$$

Using the method of Lagrange multipliers, we can augment the optimization procedure into the following new optimization,

$$\begin{aligned} \min _{w,\lambda } {\mathcal {L}}(w, \lambda )&= \sum _{i} w_i \log \left( \frac{w_i}{\nu _i} \right) + \sum _k \lambda _k \sum _{i} w_i h_k(\varvec{Z}_i, {\hat{\mu }}_{z}) + (\lambda _0 - 1)\left( \sum _{i} w_i - 1\right) \nonumber \\&= \sum _{i} w_i \log \left( \frac{w_i}{\nu _i} \right) + \sum _{i} w_i \sum _k \lambda _k h_k(\varvec{Z}_i, {\hat{\mu }}_{z}) + (\lambda _0 - 1)\left( \sum _{i} w_i - 1\right) \end{aligned}$$
(6)

Taking derivatives with respect to each weight \(w_i\), it follows that,

$$\begin{aligned} \frac{\partial }{\partial w_i} {\mathcal {L}}(w, \lambda )&= \left( 1 + \log \left( \frac{w_i}{\nu _i}\right) \right) + \sum _k \lambda _k h_k(\varvec{Z}_i, {\hat{\mu }}_{z}) + \lambda _0 - 1 \\&= \log \left( \frac{w_i}{\nu _i}\right) + \sum _k \lambda _k h_k(\varvec{Z}_i, {\hat{\mu }}_{z}) + \lambda _0 \end{aligned}$$

Setting to zero we have

$$\begin{aligned} w_i&\propto \nu _i \exp \left( -\left[ \sum _k \lambda _k h_k(\varvec{Z}_i, {\hat{\mu }}_{z}) + \lambda _0\right] \right) \end{aligned}$$

Enforcing the constraint that the weights must sum to one, it follows that

$$\begin{aligned} w_i&= \frac{\nu _i \exp \left( -\left[ \sum _k \lambda _k h_k(\varvec{Z}_i, {\hat{\mu }}_{z}) + \lambda _0\right] \right) }{\sum _j \nu _j \exp \left( -\left[ \sum _k \lambda _k h_k(\varvec{Z}_j, {\hat{\mu }}_{z}) + \lambda _0\right] \right) } \nonumber \\&= \frac{\nu _i \exp \left( -\left[ \sum _k \lambda _k h_k(\varvec{Z}_i, {\hat{\mu }}_{z})\right] \right) }{\sum _j \nu _j \exp \left( -\left[ \sum _k \lambda _k h_k(\varvec{Z}_j, {\hat{\mu }}_{z}) \right] \right) } \end{aligned}$$
(7)

resulting in the closed form solution for the weights for fixed \(\lambda\).

Next, to simplify notation, we redefine the weights as a parameter \(\zeta _i\) divided by a normalizing constant \(\zeta \equiv \sum _j \zeta _j\), i.e.,

$$\begin{aligned} w_i = \frac{\nu _i \exp \left( -\left[ \sum _k \lambda _k h_k(\varvec{Z}_i, {\hat{\mu }}_{z})\right] \right) }{\sum _j \nu _j \exp \left( -\left[ \sum _k \lambda _k h_k(\varvec{Z}_j, {\hat{\mu }}_{z}) \right] \right) } \equiv \frac{\zeta _i}{\zeta } \end{aligned}$$

where \(\zeta = \sum _j \nu _j \exp \left( -\left[ \sum _k \lambda _k h_k(\varvec{Z}_j, {\hat{\mu }}_{z}) \right] \right)\). We can also demonstrate that

$$\begin{aligned} \log (\zeta _i)&= \log (\nu _i) -\sum _k \lambda _k h_k(\varvec{Z}_i, {\hat{\mu }}_{z})\quad \text{ implying }\\ \sum _k \lambda _k h_k(\varvec{Z}_i, {\hat{\mu }}_{z})&= \log (\nu _i) - \log (\zeta _i). \end{aligned}$$

Using these two relations, we can plug the weights into the right hand side of Equation (6) (noting that \((\lambda _0 - 1)\left( \sum _{i} w_i - 1\right) = 0)\)) and obtain,

$$\begin{aligned}&\sum _{i} w_i \log \left( \frac{w_i}{\nu _i} \right) + \sum _{i} w_i \sum _k \lambda _k h_k(\varvec{Z}_i, {\hat{\mu }}_{z}) + (\lambda _0 - 1)\left( \sum _{i} w_i - 1\right) \\&\quad = \frac{1}{\zeta }\left( \sum _i \zeta _i [\log (\zeta _i) - \log (\nu _i) - \log (\zeta )] + \sum _i \zeta _i [\log (\nu _i) - \log (\zeta _i)]) \right) \\&\quad = \frac{\log (\zeta )}{\zeta } \left( \sum _i \zeta _i\right) \\&\quad = \log (\zeta ) = \log \left( \sum _i \nu _i \exp \left( -\left[ \sum _k \lambda _k h_k(\varvec{Z}_i, {\hat{\mu }}_{z}) \right] \right) \right) . \end{aligned}$$

This final line is no longer a function of the weights and therefore can be optimized only as a function of the coefficients \(\lambda _k\), the observed variables \(\varvec{Z}\) and the targets \({\hat{\mu }}_z\). Finally, inserting the appropriate functions \(h_k(\cdot , \cdot )\) would yield the objective function presented in Sect. 2.2.

Appendix 3: Supplemental simulation: assessing the effect of including higher marginal moment constraints for the exposure variable

This appendix provides a visualization and demonstration of the effect of enforcing constraints of higher order moments of the exposure variable of the form \(\sum _{i =1}^N w_i (A_i^r - \mu _A^r) = 0\) for specified moments \(r>1\). We then demonstrate how not including these constraints would affect our simulation results from the setting described in Sect. 3.3.2.

1.1 Demonstration of the role of higher exposure moments

Consider a single covariate \(X\sim {\mathcal {N}}(0,1)\) and define an exposure variable such that \(A = X + \epsilon\), where \(\epsilon \sim {\mathcal {N}}(0,1)\). Under this generative process it follows that the marginal distribution of the exposure variable is \(A\sim {\mathcal {N}}(0,2)\) and the joint distribution of the variables is \(p_{A,X}(a,x) = p_{A|X}(a|x) p_{X}(x)\), which is not equal to the product distribution \(p_{A}(a)p_{X}(x)\) over which inference is desired under our weighting scheme (i.e., when the weights are defined as \(w = p_A(a) p_{X}(x) / p_{A|X}(a|x) p_{X}(x)\) our goal is estimating expectations over the product distribution \(p_A(a) p_{X}(x)\)). This is demonstrated below for a sample of \(N=10,000\) drawn from this joint distribution (Fig. 7).

Fig. 7
figure 7

Comparing the joint distribution of the observed data points and the product distribution where inference is desired. Panel on the left plots the sample against the contours of the joint distribution. Panel on the right plots the sample against the contours of the product distribution

In essence, the goal of the weights is to “fill-in” the underrepresented portions of the product distribution, while not distorting the marginal distributions of A, the exposure, or X, the covariate, and providing independence between A and X. Figure (8) shows weighted bivariate histograms where each bin is the sum of the weights in that bin and the colors have been normalized such that the max across all bins is the darkest color and lighter areas represent less density. The figure demonstrates the effect of weights on the distributions, where the left panel is the empirical distribution of the observations, the center panel uses estimated weights only including first-order constraints on the distribution of the exposure (\(r=1\)) and the right panel uses weights that have been estimated using higher-order moment constraints for the exposure distribution in the optimization procedure (in this case \(r=1,2,3\)).

Fig. 8
figure 8

Comparing the contours of the joint distribution over which inference is desired and weighted bivariate histograms representing the weighted joint density of A and X. The bins represent the sum of the weights in that region and the color has been normalized such that each bin has been divided by the max of the sum of weights across all bins. The darkest blue represents the bin with the largest sum of weights and the smooth transition to white represents no weight in that bin (Color figure online)

Figure 8 shows that the true joint distribution has a strong relationship between A and X that is largely removed in the right two panels of the figure. However, the weighted histogram in the center panel does not completely match the target distribution (appearing to have less variance than would be necessary to fill out the contours of the plot) when only the constraint \(r=1\) (i.e., no higher moments) is enforced.

To further explore this behavior, we simulated this generative process for 1,000 replications and assessed the empirical distribution of the marginal moments among the target distribution, the weighted distribution with only the first-moment of the exposure balanced, and the distribution with higher-moments of the exposure balanced. Table 5 presents these results and demonstrates that the weighting scheme that focuses only on the first moment under estimates the variances of the product distribution, while the higher moments are able to capture this feature.

Table 5 Simulation results comparing the estimated characteristics of the empirical distribution (i.e., the targets of the optimization procedure) and two weighted distributions: (1) weights estimated using entropy balancing using only the first moment constraint for the exposure variable and higher moments for X, and (2) weights estimated using entropy balancing using higher moment constraints for both the exposure variable and X

The key take-away is that by focusing only on the first moment, representation in the tails of the exposure variable in the weighted distribution is lost and may restrict the ability to estimate the entire dose-response curve (particularly when employing nonparametric estimation methods). We note that it may be clinically important to increase representation in this region of the exposure variable as these cases may be where clinicians are most interested.

1.2 Simulation: comparison of balance conditions on the exposure variable

Next, we supplement the simulation results of Sect. 3.3.2 by considering the effect of only balancing on the first moment compared to additionally balancing higher moments of the exposure.

Table 6 Results comparing the effect of including higher moments of the exposure within the balancing constraints. The left column provides the method of estimation (as described previously in the manuscript), the next column represents the number of moment constraints on the covariates that were included in the optimization, the next two columns are results balancing all higher moments of the exposure variable (up to the number of covariate moments balanced, i.e., if the max \(s=2\), then the max \(r = 2\)). The final two columns are results restricting the algorithm to only balance the first moment of the exposure variable

Table 6 demonstrates that including higher-order moments of the exposure variable (\(r>1\)) reduced bias and MSE in almost all conditions. The only notable case where the MSE was lowered was when the outcome model was correctly specified (an unlikely situation in many observational settings) and four covariate moments were balanced. Finally, it is notable that balance on higher moments of the exposure also reduced bias and MSE for a mispecified linear model, which will likely be a common case in practice if using parametric modeling.

Appendix 4: Supplemental simulation: reducing sample size

In this appendix, we repeat the simulation of Sect. 3.1, but with a smaller sample size of \(N=200\). Figure 9 again demonstrates the true curve and two unweighted curves in the data (a quadratic model and an unweighted linear smoother).

Fig. 9
figure 9

Visualization of the distribution of exposure variable (left) and relationship between exposure and outcome (right) when \(N=200\). In the left figure, the high-density region of A is highlighted and generally lies between 1.5 and 45. In the right figure, the true marginal relationship is shown in red, a simple unweighted linear smoother is shown in blue, and an unweighted quadratic estimation is provided in green (Color figure online)

The next sections summarize results.

1.1 Covariate balance

Table 7 presents the correlation and marginal balance summaries, summarized by the average and maximum value across replications, as was done in Table 2, for \(N=200\). When compared with the results of Sect. 3.3.1, the balance metrics are still adequate (most correlation balance measures are small), but the magnitudes of the maximum KS-statistics are higher. This may indicate that there are difficulties in estimating dose-response curves in some cases as it is more difficult to obtain balance with smaller sample sizes.

Table 7 Summaries of balance assessments across replications. For Entropy Balancing and CBPS methods, the number to the right, i.e., (m), denotes the number of moments balanced. Correlation balance is assessed using the absolute value of the weighted correlation between each variable and the exposure variable A. The marginal balance is assessed using the Kolmogorov–Smirnov statistic between the unweighted empirical CDF and the weighted CDF. The top portion of the table are the averages of these statistics across replications and the bottom portion are the maximum across replications

1.2 Effect estimation

As noted throughout, while “perfect” covariate balance is often desired, the true goal is estimation of the outcome measure. Table 8 summarizes results of using the estimated weights in four different regression strategies; 1) a misspecified linear regression model (assuming a regression model where \({\texttt {Y}} \sim {\texttt { A}}\)), 2) a correctly specified linear model (i.e., a quadratic polynomial), 3) flexible modeling using local linear regression (LOESS), and 4) flexible modeling using a basis expansion of the exposure variable A using cubic B-splines.

Table 8 Comparisons of the average bias and mean squared error between the estimated dose-response curve and the true marginal relationships on a grid \(a_0 \in [1.5,45]\) and the average of the effective samples sizes for each method

The most interesting result of Table 8 is that while entropy balancing and CBPS generally reduce the effective sample size to low numbers when balancing higher moments, the performance does not dramatically suffer as a result. In fact, among the methods using LOESS to estimate the dose-response curve, the best performing weights in terms of MSE are the entropy balancing weights with three moments balanced despite an effective sample size of 62 (approximately 70 percent reduction compared with the original sample size). Future work should explore the extent to which this performance holds for entropy balancing in small samples.

1.3 Bootstrap simulation results

As in Sect. 3, we construct a simulation to explore the performance of bootstrap confidence intervals. In this section though we depart slightly from the earlier described procedure for constructing the bootstrap confidence intervals as there were computational issues with performing the split-sample procedure on bootstrapped samples with small N. In particular, in many cases the entropy balancing algorithm failed to converge due to the small sample sizes and the bootstrap requires sampling with-replacement. To overcome this, for each bootstrap sample we use the span \(\zeta\) that was found on the whole sample. The results are summarized in Fig. 10.

Fig. 10
figure 10

Simulation results for bootstrap confidence intervals. The upper left panel shows a single visualization of a bootstrapped confidence interval for the curve. Throughout the red curves represent the “truth” and the blue solid line represents the estimated curve; the dotted blue lines represent estimated 95% confidence intervals. The upper right panel contains the point-wise coverage of the 95% bootstrapped confidence intervals. Lower panels contain the magnitude (left) and ratio (right) of the average point-wise bootstrap standard error across the bootstrap simulations as compared to the standard error of the estimated curves obtained in this simulation. In all figures the vertical lines represent the 1st, 5th, 95th, 99th quantiles of the distribution of the exposure variable A in the high-density region (Color figure online)

The main finding is that the standard errors with this bootstrap procedure are underestimated, which is likely due to being unable to computationally perform the entire model selection procedure within each bootstrap sample. The performance of the bootstrap confidence intervals should be further investigated to fully understand their performance when combined with entropy balancing; particularly best practices for incorporating nonparametric model selection procedures with the weighting algorithm and performance in small samples.

Appendix 5: Supplemental simulation: no effect of exposure

In this Appendix, we reconsider the simulation of Sect. 3, but modify the outcome variable so that there is no effect of the exposure variable on the outcome variable, Y. To accomplish this, the data generation procedure is modified such that

$$\begin{aligned} Y&= X_1 + X_1^2 + X_2 + X_2^2 + X_1 X_2 + X_5 + \epsilon , \\ \epsilon&\sim {\mathcal {N}}(0,1) \end{aligned}$$

and all other data-generating parameters remain the same (recall that only the set of covariates \(\varvec{Z}\) are observed). This implies that the marginal effect is \(E[Y(a)] = 3.55\) for all \(a \in {\mathcal {A}}\) and thus A has no population-level effects. Within the simulation we again use the models described in Sect. 3, but now the linear model is the correctly specified model.

A visualization of the induced relationship between exposure and outcome is shown in Fig. 11.

Fig. 11
figure 11

Visualization of the distribution of exposure variable (left) and relationship between exposure and outcome (right). In the left panel, the high-density region of A is highlighted. In the right panel, the true marginal relationship is shown in red, a simple unweighted linear smoother is shown in blue, and an unweighted linear estimation is shown in green (Color figure online)

1.1 Results

1.1.1 Covariate balance

The covariate balance results are unchanged as the relationships among A and X in the data-generation are the same and the seed used for the random number generator is the same (Figs. 12, 13, 14).

1.1.2 Performance in estimating dose-response curves

The performance here is similar to the results of Sect. 3. Both entropy balancing and CBPS perform well when a high enough number of moments is balanced and all other methods perform less than satisfactory. The bootstrap confidence intervals again remain conservative, but in this case, it is observed that at certain points of the exposure distribution there is under-coverage. Future work should further investigate confidence intervals for these methods (Table 9).

Table 9 Comparisons of the average bias and mean squared error between the estimated dose-response curve and the true marginal relationships on a grid \(a_0 \in [1.5,45]\) and the average of the effective samples sizes for each method
Fig. 12
figure 12

Performance estimating the dose-response curve across repeated simulated samples: Local Linear Regression

Fig. 13
figure 13

Performance estimating the dose-response curve across repeated simulated samples: Linear Regression - outcome is correctly modeled using linear relationship with exposure

Fig. 14
figure 14

Simulation results for bootstrap confidence intervals. The upper left panel contains a single visualization of a bootstrapped confidence interval for the curve. The upper right panel contains the point-wise coverage of the 95% bootstrapped confidence intervals. Lower panels contain the magnitude (left) and ratio (right) of the average point-wise bootstrap standard error across the bootstrap simulations as compared to the standard error of the estimated curves obtained in the simulation. In all figures, the vertical lines represent the 1st, 5th, 95th, 99th quantiles of the distribution of A in the high-density region

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vegetabile, B.G., Griffin, B.A., Coffman, D.L. et al. Nonparametric estimation of population average dose-response curves using entropy balancing weights for continuous exposures. Health Serv Outcomes Res Method 21, 69–110 (2021). https://doi.org/10.1007/s10742-020-00236-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10742-020-00236-2

Keywords

Navigation