Abstract
A generalized varying-coefficient model is proposed to estimate a population size at a specific time from multiple lists of an open population. The research datasets have millions of records with a very long time span (38 years), bringing challenges to calculations. The authors develop a regularization iterative algorithm to overcome this difficulty. The asymptotic distribution of the proposed estimators is derived. Simulation studies show that the procedure works well. The method is applied to estimate the number of drug abusers in Hong Kong, China over the period 1977–2014.
Similar content being viewed by others
References
Cormack R, Log-linear models for capture-recapture, Biometrics, 1989, 45: 395–413.
Fienberg S, The multiple recapture census for closed population and incomplete 2k contingency tables, Biometrika, 1972, 59: 591–603.
Lin H, Yip P, and Chen F, Estimating the population size for a multiple list problem with an open population, Statistica Sinica, 2009, 19: 177–196.
International Working Group for Disease Monitoring and Forecasting, Capture-recapture and multiple-record systems estimation, I: History and theoretical development, Am. J. Epidemiol, 1995a, 142: 1047–1058.
International Working Group for Disease Monitoring and Forecasting. Capture-recapture and multiple-record systems estimation, II: Applications in human diseases, Am. J. Epidemiol, 1995b, 142: 1059–1068.
Cormack R and Jupp P, Inference for Poisson and multinomial models for capture-recapture experiments, Biometrika, 1991, 78: 911–916.
Chao A and Lee S, Estimating the number of classes via sample coverage, J. Amer. Statist. Assoc, 1992, 87: 210–217.
Huggins R and Yip P, Estimation of the size of the open population from capture-recapture data using weighted martingale methods, Biometrics, 1999, 55: 387–395.
Huggins R, Yang H, Chao A, et al., Population size estimation using local sample coverage for open populations, J. Statist. Plann. Inference, 2003, 113: 699–714.
Yang H, Huggins R, and Clark A, Estimation of the size of an open population using local estimating equations II: A partially parametric approach, Biometrics, 2003, 59: 365–374.
Alho J, Logistic regression in capture-recapture models, Biometrics, 1990, 46: 623–635.
Huggins R, On the statistical analysis of capture experiments, Biometrika, 1989, 76: 133–140.
Zwane E and Van Der Heijden P, Semiparametric models for capture-recapture studies with covariates, Computational Statistics & Data Analysis, 2004, 47: 729–743.
Hwang W and Huggins R, A semiparametric model for a functional behavioural response to capture in capture-recapture experiments, Australian & New Zealand Journal of Statistics, 2011, 53: 191–202.
Stoklosa J and Huggins R, A robust P-spline approach to closed population capture-recapture models with time dependence and heterogeneity, Computational Statistics & Data Analysis, 2012, 56: 408–417.
Huggins R, Yip P, and Stoklosa J, Nonparametric estimation of the size of an open population from repeated multiple list, Australian & New Zealand Journal of Statistics, 2016, 58: 1–13.
Huggins R, Stoklosa J, Roach C, et al., Estimating the size of an open population using sparse capture-recapture data, Biometrics, 2018, 74: 280–288.
Stoklosa J, Hwang W, Yip P, et al., Accounting for contamination and outliers in covariates for open population capture-recapture models, Journal of Statistical Planning and Inference, 2016, 176: 52–63.
Li H, Lin H, Yip P, et al., Estimating population size of heterogeneous populations with large data sets and a large number of parameters, Computational Statistics & Data Analysis, 2019, 139: 34–44.
Chen K, Parametric and semiparametric models for recapture and removal studies: A likelihood approach, J. R. Statist. Soc. B, 2001, 63: 607–619.
Gray R, Flexible methods for analyzing survival data using splines, with application to breast cancer prognosis, Journal of the American Statistical Association, 1992, 87, 942–951.
Michelot T, Langrock R, Kneib T, et al., Maximum penalized likelihood estimation in semiparametric mark-recapture-recovery models, Biometrical Journal, 2016, 58: 222–239.
Lehmann E, Elements of Large-Sample Theory, Springer, New York, 1999.
Acknowledgements
The authors are grateful to the Narcotics Bureau for providing the data for analysis.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by the National Natural Science Foundation of China under Grant Nos. 11731015, 11571148, and the Natural Science Foundation of Chongqing under Grant No. cstc2019jcyj-msxmX0709, the Science and Technology Research Program of Chongqing Municipal Education Commission under Grant No. KJQN201901436.
This paper was recommended for publication by Editor LI Qizhai.
Rights and permissions
About this article
Cite this article
Li, H., Li, Y. Estimating the Size of an Open Population with Massive Datasets Based on a Generalized Varying-Coefficient Model. J Syst Sci Complex 35, 1116–1136 (2022). https://doi.org/10.1007/s11424-021-0224-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-021-0224-z