Generalizing Univariate Predictive Mean Matching to Impute Multiple Variables Simultaneously

Cai, Mingyang; van Buuren, Stef; Vink, Gerko

doi:10.1007/978-3-031-10461-9_5

Mingyang Cai¹⁰,
Stef van Buuren¹⁰ &
Gerko Vink¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 506))

Included in the following conference series:

Science and Information Conference

851 Accesses

Abstract

Predictive mean matching (PMM) is an easy-to-use and versatile univariate imputation approach. It is robust against transformations of the incomplete variable and violation of the normal model. However, univariate imputation methods cannot directly preserve multivariate relations in the imputed data. We wish to extend PMM to a multivariate method to produce imputations that are consistent with the knowledge of derived data (e.g., data transformations, interactions, sum restrictions, range restrictions, and polynomials). This paper proposes multivariate predictive mean matching (MPMM), which can impute incomplete variables simultaneously. Instead of the normal linear model, we apply canonical regression analysis to calculate the predicted value used for donor selection. To evaluate the performance of MPMM, we compared it with other imputation approaches under four scenarios: 1) multivariate normal distributed data, 2) linear regression with quadratic terms; 3) linear regression with interaction terms; 4) incomplete data with inequality restrictions. The simulation study shows that with moderate missingness patterns, MPMM provides plausible imputations at the univariate level and preserves relations in the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods

Article 15 July 2022

The effect of high prevalence of missing data on estimation of the coefficients of a logistic regression model when using multiple imputation

Article Open access 18 July 2022

Outcome-sensitive multiple imputation: a simulation study

Article Open access 09 January 2017

Notes

1.
With left-tailed (MARleft), centered (MARmid), both tailed (MARtail) or right-tailed (MARright) missingness mechanism, a higher probability of \(\boldsymbol{X}\) being missing are assigned to the units with low, centered, extreme and high values of Y respectively.

References

Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (2004)
MATH Google Scholar
Schafer, J.L.: Analysis of Incomplete Multivariate Data. CRC Press, Boca Raton (1997)
Book Google Scholar
Van Buuren, S.: Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med Res. 16(3), 219–242 (2007)
Article MathSciNet Google Scholar
Goldstein, H., Carpenter, J.R., Browne, W.J.: Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. J. Roy. Stat. Soc. Ser. A. 177(2), 553–564 (2014)
Article MathSciNet Google Scholar
van Buuren, S.: Flexible Imputation of Missing Data, 2nd edn. Chapman and Hall/CRC (2018). https://doi.org/10.1201/9780429492259
Little, R.J.A.: Missing-data adjustments in large surveys. J Bus. Econ. Stat. 6(3), 287–296 (1988). https://doi.org/10.1080/07350015.1988.10509663
Article Google Scholar
van Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3) (2011). https://doi.org/10.18637/jss.v045.i03
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 793. Wiley, New York (2019)
MATH Google Scholar
Schafer, J.L.: Multiple imputation: a primer. Stat. Methods Med. Res. 8(1), 3–15 (1999)
Article Google Scholar
Sinharay, S., Stern, H.S., Russell, D.: The use of multiple imputation for the analysis of missing data. Psychol. Methods 6(4), 317 (2001)
Article Google Scholar
Allison, P.D.: Missing Data. Sage Publications, Thousand Oaks (2001)
MATH Google Scholar
Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7(2), 147 (2002)
Article Google Scholar
Longford, N.: Multilevel analysis with messy data. Stat. Methods Med. Res. 10(6), 429–444 (2001)
Article Google Scholar
Olinsky, A., Chen, S., Harlow, L.: The comparative efficacy of imputation methods for missing data in structural equation modeling. Eur. J. Oper. Res. 151(1), 53–79 (2003)
Article MathSciNet Google Scholar
Allison, P.D.: Missing data techniques for structural equation modeling. J. Abnorm. Psychol. 112(4), 545 (2003)
Article Google Scholar
Twisk, J., de Vente, W.: Attrition in longitudinal studies: how to deal with missing data. J. Clin. Epidemiol. 55(4), 329–337 (2002)
Article Google Scholar
Demirtas, H.: Modeling incomplete longitudinal data. J. Mod. Appl. Stat. Methods 3(2), 5 (2004)
Google Scholar
Pigott, T.D.: Missing predictors in models of effect size. Eval. Health Prof. 24(3), 277–307 (2001)
Article Google Scholar
Schafer, J.L.: Multiple imputation in multivariate problems when the imputation and analysis models differ. Stat. Neerl. 57(1), 19–35 (2003)
Article MathSciNet Google Scholar
Seaman, S.R., White, I.R.: Review of inverse probability weighting for dealing with missing data. Stat. Methods Med. Res. 22(3), 278–295 (2013)
Article MathSciNet Google Scholar
Ibrahim, J.G., Chen, M.-H., Lipsitz, S.R., Herring, A.H.: Missing-data methods for generalized linear models: a comparative review. J. Am. Stat. Assoc. 100(469), 332–346 (2005)
Article MathSciNet Google Scholar
Israels, A.Z.: Eigenvalue Techniques for Qualitative Data (m&t series). DSWO Press, Leiden (1987)
Google Scholar
Izenman, A.J.: Reduced-rank regression for the multivariate linear model. J. Multivar. Anal. 5(2), 248–264 (1975)
Article MathSciNet Google Scholar
Sun, L., Ji, S., Yu, S., Ye, J.: On the equivalence between canonical correlation analysis and orthonormalized partial least squares. In: Twenty-First International Joint Conference on Artificial Intelligence (2009)
Google Scholar
McDonald, R.P.: A unified treatment of the weighting problem. Psychometrika. 33(3), 351–381 (1968). https://doi.org/10.1007/bf02289330
Article MathSciNet MATH Google Scholar
Rubin, D.B.: Statistical matching using file concatenation with adjusted weights and multiple imputations. J. Bus. Econ. Stat. 4(1), 87 (1986). https://doi.org/10.2307/1391390
Article MathSciNet Google Scholar
Vink, G., Lazendic, G., van Buuren, S.: Partitioned predictive mean matching as a large data multilevel imputation technique. Psychol. Test Assess. Model. 57(4), 577–594 (2015)
Google Scholar
Heitjan, D.F., Little, R.J.A.: Multiple imputation for the fatal accident reporting system. Appl. Stat. 40(1), 13 (1991). https://doi.org/10.2307/2347902
Article MATH Google Scholar
Morris, T. P., White, I. R., Royston, P.: Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med. Res. Methodol. 14(1) (2014). https://doi.org/10.1186/1471-2288-14-75
Vink, G., Frank, L. E., Pannekoek, J., van Buuren, S.: Predictive mean matching imputation of semicontinuous variables. Stat. Neerl. 68(1), 61–90 (2014). https://doi.org/10.1111/stan.12023
Carpenter, J., Kenward, M.: Multiple Imputation and Its Application. Wiley, New York (2012)
MATH Google Scholar
Seaman, S.R., Bartlett, J.W., White, I.R.: Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. BMC Med. Res. Methodol. 12(1) (2012). https://doi.org/10.1186/1471-2288-12-46
Rencher, A.C.: Methods of Multivariate Analysis, vol. 492. Wiley, New York (2003)
MATH Google Scholar
Van Den Wollenberg, A.L.: Redundancy analysis an alternative for canonical correlation analysis. Psychometrika 42(2), 207–219 (1977)
Article Google Scholar
Schouten, R.M., Lugtig, P., Vink, G.: Generating missing values for simulation purposes: a multivariate amputation procedure. J. Stat. Comput. Simul. 88(15), 2909–2930 (2018). https://doi.org/10.1080/00949655.2018.1491577
Article MathSciNet MATH Google Scholar
R Core Team: R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria (2021). https://www.R-project.org/
Von Hippel, P.: How to impute interactions, squares, and other transformed variables. Sociol. Methodol. 39(1), 265–291 (2009)
Article Google Scholar
Vink, G., van Buuren, S.: Multiple imputation of squared terms. Sociol. Methods Res. 42(4), 598–607 (2013). https://doi.org/10.1177/0049124113502943
Article MathSciNet Google Scholar
Bartlett, J. W., Seaman, S. R., White, I. R., Carpenter, J. R., Initiative*, A.D.N.: Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Stat. Methods Med. Res. 24(4), 462–487 (2015)
Google Scholar
de Jong, R., van Buuren, S., Spiess, M.: Multiple imputation of predictor variables using generalized additive models. Commun. Stat. Simul. Comput. 45(3), 968–985 (2014). https://doi.org/10.1080/03610918.2014.911894
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Utrecht University, Padualaan 14, 3584 CH, Utrecht, Netherlands
Mingyang Cai, Stef van Buuren & Gerko Vink

Authors

Mingyang Cai
View author publications
You can also search for this author in PubMed Google Scholar
Stef van Buuren
View author publications
You can also search for this author in PubMed Google Scholar
Gerko Vink
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingyang Cai .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai

Appendix

The MPMM algorithm with multiple missing patterns:

1.
Sort the rows of \(\boldsymbol{Y}\) into S missing data patterns \(\boldsymbol{Y_{[s]}}, s=1,\cdots ,S\).
2.
Initialize \(\boldsymbol{Y_{mis}}\) by a reasonable starting value.
3.
Repeat for \(T=1,\cdots ,t\).
4.
Repeat for \(S=1,\cdots ,s\).
5.
Impute missing values by steps 1–8 of PMM-CRA algorithm proposed in Sect. 2.3.
6.
Repeat steps 1–5 m times and save m completed datasets.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, M., van Buuren, S., Vink, G. (2022). Generalizing Univariate Predictive Mean Matching to Impute Multiple Variables Simultaneously. In: Arai, K. (eds) Intelligent Computing. SAI 2022. Lecture Notes in Networks and Systems, vol 506. Springer, Cham. https://doi.org/10.1007/978-3-031-10461-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-10461-9_5
Published: 07 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10460-2
Online ISBN: 978-3-031-10461-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Generalizing Univariate Predictive Mean Matching to Impute Multiple Variables Simultaneously

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods

The effect of high prevalence of missing data on estimation of the coefficients of a logistic regression model when using multiple imputation

Outcome-sensitive multiple imputation: a simulation study

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Generalizing Univariate Predictive Mean Matching to Impute Multiple Variables Simultaneously

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods

The effect of high prevalence of missing data on estimation of the coefficients of a logistic regression model when using multiple imputation

Outcome-sensitive multiple imputation: a simulation study

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation