Abstract
Predictive mean matching (PMM) is an easy-to-use and versatile univariate imputation approach. It is robust against transformations of the incomplete variable and violation of the normal model. However, univariate imputation methods cannot directly preserve multivariate relations in the imputed data. We wish to extend PMM to a multivariate method to produce imputations that are consistent with the knowledge of derived data (e.g., data transformations, interactions, sum restrictions, range restrictions, and polynomials). This paper proposes multivariate predictive mean matching (MPMM), which can impute incomplete variables simultaneously. Instead of the normal linear model, we apply canonical regression analysis to calculate the predicted value used for donor selection. To evaluate the performance of MPMM, we compared it with other imputation approaches under four scenarios: 1) multivariate normal distributed data, 2) linear regression with quadratic terms; 3) linear regression with interaction terms; 4) incomplete data with inequality restrictions. The simulation study shows that with moderate missingness patterns, MPMM provides plausible imputations at the univariate level and preserves relations in the data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
With left-tailed (MARleft), centered (MARmid), both tailed (MARtail) or right-tailed (MARright) missingness mechanism, a higher probability of \(\boldsymbol{X}\) being missing are assigned to the units with low, centered, extreme and high values of Y respectively.
References
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (2004)
Schafer, J.L.: Analysis of Incomplete Multivariate Data. CRC Press, Boca Raton (1997)
Van Buuren, S.: Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med Res. 16(3), 219–242 (2007)
Goldstein, H., Carpenter, J.R., Browne, W.J.: Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. J. Roy. Stat. Soc. Ser. A. 177(2), 553–564 (2014)
van Buuren, S.: Flexible Imputation of Missing Data, 2nd edn. Chapman and Hall/CRC (2018). https://doi.org/10.1201/9780429492259
Little, R.J.A.: Missing-data adjustments in large surveys. J Bus. Econ. Stat. 6(3), 287–296 (1988). https://doi.org/10.1080/07350015.1988.10509663
van Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3) (2011). https://doi.org/10.18637/jss.v045.i03
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 793. Wiley, New York (2019)
Schafer, J.L.: Multiple imputation: a primer. Stat. Methods Med. Res. 8(1), 3–15 (1999)
Sinharay, S., Stern, H.S., Russell, D.: The use of multiple imputation for the analysis of missing data. Psychol. Methods 6(4), 317 (2001)
Allison, P.D.: Missing Data. Sage Publications, Thousand Oaks (2001)
Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7(2), 147 (2002)
Longford, N.: Multilevel analysis with messy data. Stat. Methods Med. Res. 10(6), 429–444 (2001)
Olinsky, A., Chen, S., Harlow, L.: The comparative efficacy of imputation methods for missing data in structural equation modeling. Eur. J. Oper. Res. 151(1), 53–79 (2003)
Allison, P.D.: Missing data techniques for structural equation modeling. J. Abnorm. Psychol. 112(4), 545 (2003)
Twisk, J., de Vente, W.: Attrition in longitudinal studies: how to deal with missing data. J. Clin. Epidemiol. 55(4), 329–337 (2002)
Demirtas, H.: Modeling incomplete longitudinal data. J. Mod. Appl. Stat. Methods 3(2), 5 (2004)
Pigott, T.D.: Missing predictors in models of effect size. Eval. Health Prof. 24(3), 277–307 (2001)
Schafer, J.L.: Multiple imputation in multivariate problems when the imputation and analysis models differ. Stat. Neerl. 57(1), 19–35 (2003)
Seaman, S.R., White, I.R.: Review of inverse probability weighting for dealing with missing data. Stat. Methods Med. Res. 22(3), 278–295 (2013)
Ibrahim, J.G., Chen, M.-H., Lipsitz, S.R., Herring, A.H.: Missing-data methods for generalized linear models: a comparative review. J. Am. Stat. Assoc. 100(469), 332–346 (2005)
Israels, A.Z.: Eigenvalue Techniques for Qualitative Data (m&t series). DSWO Press, Leiden (1987)
Izenman, A.J.: Reduced-rank regression for the multivariate linear model. J. Multivar. Anal. 5(2), 248–264 (1975)
Sun, L., Ji, S., Yu, S., Ye, J.: On the equivalence between canonical correlation analysis and orthonormalized partial least squares. In: Twenty-First International Joint Conference on Artificial Intelligence (2009)
McDonald, R.P.: A unified treatment of the weighting problem. Psychometrika. 33(3), 351–381 (1968). https://doi.org/10.1007/bf02289330
Rubin, D.B.: Statistical matching using file concatenation with adjusted weights and multiple imputations. J. Bus. Econ. Stat. 4(1), 87 (1986). https://doi.org/10.2307/1391390
Vink, G., Lazendic, G., van Buuren, S.: Partitioned predictive mean matching as a large data multilevel imputation technique. Psychol. Test Assess. Model. 57(4), 577–594 (2015)
Heitjan, D.F., Little, R.J.A.: Multiple imputation for the fatal accident reporting system. Appl. Stat. 40(1), 13 (1991). https://doi.org/10.2307/2347902
Morris, T. P., White, I. R., Royston, P.: Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med. Res. Methodol. 14(1) (2014). https://doi.org/10.1186/1471-2288-14-75
Vink, G., Frank, L. E., Pannekoek, J., van Buuren, S.: Predictive mean matching imputation of semicontinuous variables. Stat. Neerl. 68(1), 61–90 (2014). https://doi.org/10.1111/stan.12023
Carpenter, J., Kenward, M.: Multiple Imputation and Its Application. Wiley, New York (2012)
Seaman, S.R., Bartlett, J.W., White, I.R.: Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. BMC Med. Res. Methodol. 12(1) (2012). https://doi.org/10.1186/1471-2288-12-46
Rencher, A.C.: Methods of Multivariate Analysis, vol. 492. Wiley, New York (2003)
Van Den Wollenberg, A.L.: Redundancy analysis an alternative for canonical correlation analysis. Psychometrika 42(2), 207–219 (1977)
Schouten, R.M., Lugtig, P., Vink, G.: Generating missing values for simulation purposes: a multivariate amputation procedure. J. Stat. Comput. Simul. 88(15), 2909–2930 (2018). https://doi.org/10.1080/00949655.2018.1491577
R Core Team: R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria (2021). https://www.R-project.org/
Von Hippel, P.: How to impute interactions, squares, and other transformed variables. Sociol. Methodol. 39(1), 265–291 (2009)
Vink, G., van Buuren, S.: Multiple imputation of squared terms. Sociol. Methods Res. 42(4), 598–607 (2013). https://doi.org/10.1177/0049124113502943
Bartlett, J. W., Seaman, S. R., White, I. R., Carpenter, J. R., Initiative*, A.D.N.: Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Stat. Methods Med. Res. 24(4), 462–487 (2015)
de Jong, R., van Buuren, S., Spiess, M.: Multiple imputation of predictor variables using generalized additive models. Commun. Stat. Simul. Comput. 45(3), 968–985 (2014). https://doi.org/10.1080/03610918.2014.911894
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
The MPMM algorithm with multiple missing patterns:
-
1.
Sort the rows of \(\boldsymbol{Y}\) into S missing data patterns \(\boldsymbol{Y_{[s]}}, s=1,\cdots ,S\).
-
2.
Initialize \(\boldsymbol{Y_{mis}}\) by a reasonable starting value.
-
3.
Repeat for \(T=1,\cdots ,t\).
-
4.
Repeat for \(S=1,\cdots ,s\).
-
5.
Impute missing values by steps 1–8 of PMM-CRA algorithm proposed in Sect. 2.3.
-
6.
Repeat steps 1–5 m times and save m completed datasets.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cai, M., van Buuren, S., Vink, G. (2022). Generalizing Univariate Predictive Mean Matching to Impute Multiple Variables Simultaneously. In: Arai, K. (eds) Intelligent Computing. SAI 2022. Lecture Notes in Networks and Systems, vol 506. Springer, Cham. https://doi.org/10.1007/978-3-031-10461-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-10461-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10460-2
Online ISBN: 978-3-031-10461-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)