Generalizing Univariate Predictive Mean Matching to Impute Multiple Variables Simultaneously

  • Conference paper
  • First Online:
Intelligent Computing (SAI 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 506))

Included in the following conference series:

  • 851 Accesses

Abstract

Predictive mean matching (PMM) is an easy-to-use and versatile univariate imputation approach. It is robust against transformations of the incomplete variable and violation of the normal model. However, univariate imputation methods cannot directly preserve multivariate relations in the imputed data. We wish to extend PMM to a multivariate method to produce imputations that are consistent with the knowledge of derived data (e.g., data transformations, interactions, sum restrictions, range restrictions, and polynomials). This paper proposes multivariate predictive mean matching (MPMM), which can impute incomplete variables simultaneously. Instead of the normal linear model, we apply canonical regression analysis to calculate the predicted value used for donor selection. To evaluate the performance of MPMM, we compared it with other imputation approaches under four scenarios: 1) multivariate normal distributed data, 2) linear regression with quadratic terms; 3) linear regression with interaction terms; 4) incomplete data with inequality restrictions. The simulation study shows that with moderate missingness patterns, MPMM provides plausible imputations at the univariate level and preserves relations in the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    With left-tailed (MARleft), centered (MARmid), both tailed (MARtail) or right-tailed (MARright) missingness mechanism, a higher probability of \(\boldsymbol{X}\) being missing are assigned to the units with low, centered, extreme and high values of Y respectively.

References

  1. Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (2004)

    MATH  Google Scholar 

  2. Schafer, J.L.: Analysis of Incomplete Multivariate Data. CRC Press, Boca Raton (1997)

    Book  Google Scholar 

  3. Van Buuren, S.: Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med Res. 16(3), 219–242 (2007)

    Article  MathSciNet  Google Scholar 

  4. Goldstein, H., Carpenter, J.R., Browne, W.J.: Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. J. Roy. Stat. Soc. Ser. A. 177(2), 553–564 (2014)

    Article  MathSciNet  Google Scholar 

  5. van Buuren, S.: Flexible Imputation of Missing Data, 2nd edn. Chapman and Hall/CRC (2018). https://doi.org/10.1201/9780429492259

  6. Little, R.J.A.: Missing-data adjustments in large surveys. J Bus. Econ. Stat. 6(3), 287–296 (1988). https://doi.org/10.1080/07350015.1988.10509663

    Article  Google Scholar 

  7. van Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3) (2011). https://doi.org/10.18637/jss.v045.i03

  8. Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 793. Wiley, New York (2019)

    MATH  Google Scholar 

  9. Schafer, J.L.: Multiple imputation: a primer. Stat. Methods Med. Res. 8(1), 3–15 (1999)

    Article  Google Scholar 

  10. Sinharay, S., Stern, H.S., Russell, D.: The use of multiple imputation for the analysis of missing data. Psychol. Methods 6(4), 317 (2001)

    Article  Google Scholar 

  11. Allison, P.D.: Missing Data. Sage Publications, Thousand Oaks (2001)

    MATH  Google Scholar 

  12. Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7(2), 147 (2002)

    Article  Google Scholar 

  13. Longford, N.: Multilevel analysis with messy data. Stat. Methods Med. Res. 10(6), 429–444 (2001)

    Article  Google Scholar 

  14. Olinsky, A., Chen, S., Harlow, L.: The comparative efficacy of imputation methods for missing data in structural equation modeling. Eur. J. Oper. Res. 151(1), 53–79 (2003)

    Article  MathSciNet  Google Scholar 

  15. Allison, P.D.: Missing data techniques for structural equation modeling. J. Abnorm. Psychol. 112(4), 545 (2003)

    Article  Google Scholar 

  16. Twisk, J., de Vente, W.: Attrition in longitudinal studies: how to deal with missing data. J. Clin. Epidemiol. 55(4), 329–337 (2002)

    Article  Google Scholar 

  17. Demirtas, H.: Modeling incomplete longitudinal data. J. Mod. Appl. Stat. Methods 3(2), 5 (2004)

    Google Scholar 

  18. Pigott, T.D.: Missing predictors in models of effect size. Eval. Health Prof. 24(3), 277–307 (2001)

    Article  Google Scholar 

  19. Schafer, J.L.: Multiple imputation in multivariate problems when the imputation and analysis models differ. Stat. Neerl. 57(1), 19–35 (2003)

    Article  MathSciNet  Google Scholar 

  20. Seaman, S.R., White, I.R.: Review of inverse probability weighting for dealing with missing data. Stat. Methods Med. Res. 22(3), 278–295 (2013)

    Article  MathSciNet  Google Scholar 

  21. Ibrahim, J.G., Chen, M.-H., Lipsitz, S.R., Herring, A.H.: Missing-data methods for generalized linear models: a comparative review. J. Am. Stat. Assoc. 100(469), 332–346 (2005)

    Article  MathSciNet  Google Scholar 

  22. Israels, A.Z.: Eigenvalue Techniques for Qualitative Data (m&t series). DSWO Press, Leiden (1987)

    Google Scholar 

  23. Izenman, A.J.: Reduced-rank regression for the multivariate linear model. J. Multivar. Anal. 5(2), 248–264 (1975)

    Article  MathSciNet  Google Scholar 

  24. Sun, L., Ji, S., Yu, S., Ye, J.: On the equivalence between canonical correlation analysis and orthonormalized partial least squares. In: Twenty-First International Joint Conference on Artificial Intelligence (2009)

    Google Scholar 

  25. McDonald, R.P.: A unified treatment of the weighting problem. Psychometrika. 33(3), 351–381 (1968). https://doi.org/10.1007/bf02289330

    Article  MathSciNet  MATH  Google Scholar 

  26. Rubin, D.B.: Statistical matching using file concatenation with adjusted weights and multiple imputations. J. Bus. Econ. Stat. 4(1), 87 (1986). https://doi.org/10.2307/1391390

    Article  MathSciNet  Google Scholar 

  27. Vink, G., Lazendic, G., van Buuren, S.: Partitioned predictive mean matching as a large data multilevel imputation technique. Psychol. Test Assess. Model. 57(4), 577–594 (2015)

    Google Scholar 

  28. Heitjan, D.F., Little, R.J.A.: Multiple imputation for the fatal accident reporting system. Appl. Stat. 40(1), 13 (1991). https://doi.org/10.2307/2347902

    Article  MATH  Google Scholar 

  29. Morris, T. P., White, I. R., Royston, P.: Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med. Res. Methodol. 14(1) (2014). https://doi.org/10.1186/1471-2288-14-75

  30. Vink, G., Frank, L. E., Pannekoek, J., van Buuren, S.: Predictive mean matching imputation of semicontinuous variables. Stat. Neerl. 68(1), 61–90 (2014). https://doi.org/10.1111/stan.12023

  31. Carpenter, J., Kenward, M.: Multiple Imputation and Its Application. Wiley, New York (2012)

    MATH  Google Scholar 

  32. Seaman, S.R., Bartlett, J.W., White, I.R.: Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. BMC Med. Res. Methodol. 12(1) (2012). https://doi.org/10.1186/1471-2288-12-46

  33. Rencher, A.C.: Methods of Multivariate Analysis, vol. 492. Wiley, New York (2003)

    MATH  Google Scholar 

  34. Van Den Wollenberg, A.L.: Redundancy analysis an alternative for canonical correlation analysis. Psychometrika 42(2), 207–219 (1977)

    Article  Google Scholar 

  35. Schouten, R.M., Lugtig, P., Vink, G.: Generating missing values for simulation purposes: a multivariate amputation procedure. J. Stat. Comput. Simul. 88(15), 2909–2930 (2018). https://doi.org/10.1080/00949655.2018.1491577

    Article  MathSciNet  MATH  Google Scholar 

  36. R Core Team: R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria (2021). https://www.R-project.org/

  37. Von Hippel, P.: How to impute interactions, squares, and other transformed variables. Sociol. Methodol. 39(1), 265–291 (2009)

    Article  Google Scholar 

  38. Vink, G., van Buuren, S.: Multiple imputation of squared terms. Sociol. Methods Res. 42(4), 598–607 (2013). https://doi.org/10.1177/0049124113502943

    Article  MathSciNet  Google Scholar 

  39. Bartlett, J. W., Seaman, S. R., White, I. R., Carpenter, J. R., Initiative*, A.D.N.: Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Stat. Methods Med. Res. 24(4), 462–487 (2015)

    Google Scholar 

  40. de Jong, R., van Buuren, S., Spiess, M.: Multiple imputation of predictor variables using generalized additive models. Commun. Stat. Simul. Comput. 45(3), 968–985 (2014). https://doi.org/10.1080/03610918.2014.911894

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingyang Cai .

Editor information

Editors and Affiliations

Appendix

Appendix

The MPMM algorithm with multiple missing patterns:

  1. 1.

    Sort the rows of \(\boldsymbol{Y}\) into S missing data patterns \(\boldsymbol{Y_{[s]}}, s=1,\cdots ,S\).

  2. 2.

    Initialize \(\boldsymbol{Y_{mis}}\) by a reasonable starting value.

  3. 3.

    Repeat for \(T=1,\cdots ,t\).

  4. 4.

    Repeat for \(S=1,\cdots ,s\).

  5. 5.

    Impute missing values by steps 1–8 of PMM-CRA algorithm proposed in Sect. 2.3.

  6. 6.

    Repeat steps 1–5 m times and save m completed datasets.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cai, M., van Buuren, S., Vink, G. (2022). Generalizing Univariate Predictive Mean Matching to Impute Multiple Variables Simultaneously. In: Arai, K. (eds) Intelligent Computing. SAI 2022. Lecture Notes in Networks and Systems, vol 506. Springer, Cham. https://doi.org/10.1007/978-3-031-10461-9_5

Download citation

Publish with us

Policies and ethics

Navigation