Estimating the Size of an Open Population with Massive Datasets Based on a Generalized Varying-Coefficient Model

Li, Haoqi; Li, Yuan

doi:10.1007/s11424-021-0224-z

Estimating the Size of an Open Population with Massive Datasets Based on a Generalized Varying-Coefficient Model

Published: 23 October 2021

Volume 35, pages 1116–1136, (2022)
Cite this article

Journal of Systems Science and Complexity Aims and scope Submit manuscript

Haoqi Li^1,2 &
Yuan Li¹

51 Accesses
Explore all metrics

Abstract

A generalized varying-coefficient model is proposed to estimate a population size at a specific time from multiple lists of an open population. The research datasets have millions of records with a very long time span (38 years), bringing challenges to calculations. The authors develop a regularization iterative algorithm to overcome this difficulty. The asymptotic distribution of the proposed estimators is derived. Simulation studies show that the procedure works well. The method is applied to estimate the number of drug abusers in Hong Kong, China over the period 1977–2014.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Discussion of “The power of monitoring: how to make the most of a contaminated multivariate sample” by Andrea Cerioli, Marco Riani, Anthony C. Atkinson and Aldo Corbellini

Article 27 November 2017

Statistical Leveraging Methods in Big Data

Adaptive quantile regressions for massive datasets

Article 23 March 2020

References

Cormack R, Log-linear models for capture-recapture, Biometrics, 1989, 45: 395–413.
Article Google Scholar
Fienberg S, The multiple recapture census for closed population and incomplete 2k contingency tables, Biometrika, 1972, 59: 591–603.
MathSciNet MATH Google Scholar
Lin H, Yip P, and Chen F, Estimating the population size for a multiple list problem with an open population, Statistica Sinica, 2009, 19: 177–196.
MathSciNet MATH Google Scholar
International Working Group for Disease Monitoring and Forecasting, Capture-recapture and multiple-record systems estimation, I: History and theoretical development, Am. J. Epidemiol, 1995a, 142: 1047–1058.
Article Google Scholar
International Working Group for Disease Monitoring and Forecasting. Capture-recapture and multiple-record systems estimation, II: Applications in human diseases, Am. J. Epidemiol, 1995b, 142: 1059–1068.
Article Google Scholar
Cormack R and Jupp P, Inference for Poisson and multinomial models for capture-recapture experiments, Biometrika, 1991, 78: 911–916.
Article MathSciNet Google Scholar
Chao A and Lee S, Estimating the number of classes via sample coverage, J. Amer. Statist. Assoc, 1992, 87: 210–217.
Article MathSciNet Google Scholar
Huggins R and Yip P, Estimation of the size of the open population from capture-recapture data using weighted martingale methods, Biometrics, 1999, 55: 387–395.
Article Google Scholar
Huggins R, Yang H, Chao A, et al., Population size estimation using local sample coverage for open populations, J. Statist. Plann. Inference, 2003, 113: 699–714.
Article MathSciNet Google Scholar
Yang H, Huggins R, and Clark A, Estimation of the size of an open population using local estimating equations II: A partially parametric approach, Biometrics, 2003, 59: 365–374.
Article MathSciNet Google Scholar
Alho J, Logistic regression in capture-recapture models, Biometrics, 1990, 46: 623–635.
Article MathSciNet Google Scholar
Huggins R, On the statistical analysis of capture experiments, Biometrika, 1989, 76: 133–140.
Article MathSciNet Google Scholar
Zwane E and Van Der Heijden P, Semiparametric models for capture-recapture studies with covariates, Computational Statistics & Data Analysis, 2004, 47: 729–743.
Article MathSciNet Google Scholar
Hwang W and Huggins R, A semiparametric model for a functional behavioural response to capture in capture-recapture experiments, Australian & New Zealand Journal of Statistics, 2011, 53: 191–202.
Article MathSciNet Google Scholar
Stoklosa J and Huggins R, A robust P-spline approach to closed population capture-recapture models with time dependence and heterogeneity, Computational Statistics & Data Analysis, 2012, 56: 408–417.
Article MathSciNet Google Scholar
Huggins R, Yip P, and Stoklosa J, Nonparametric estimation of the size of an open population from repeated multiple list, Australian & New Zealand Journal of Statistics, 2016, 58: 1–13.
Article MathSciNet Google Scholar
Huggins R, Stoklosa J, Roach C, et al., Estimating the size of an open population using sparse capture-recapture data, Biometrics, 2018, 74: 280–288.
Article MathSciNet Google Scholar
Stoklosa J, Hwang W, Yip P, et al., Accounting for contamination and outliers in covariates for open population capture-recapture models, Journal of Statistical Planning and Inference, 2016, 176: 52–63.
Article MathSciNet Google Scholar
Li H, Lin H, Yip P, et al., Estimating population size of heterogeneous populations with large data sets and a large number of parameters, Computational Statistics & Data Analysis, 2019, 139: 34–44.
Article MathSciNet Google Scholar
Chen K, Parametric and semiparametric models for recapture and removal studies: A likelihood approach, J. R. Statist. Soc. B, 2001, 63: 607–619.
Article MathSciNet Google Scholar
Gray R, Flexible methods for analyzing survival data using splines, with application to breast cancer prognosis, Journal of the American Statistical Association, 1992, 87, 942–951.
Article Google Scholar
Michelot T, Langrock R, Kneib T, et al., Maximum penalized likelihood estimation in semiparametric mark-recapture-recovery models, Biometrical Journal, 2016, 58: 222–239.
Article MathSciNet Google Scholar
Lehmann E, Elements of Large-Sample Theory, Springer, New York, 1999.
Book Google Scholar

Download references

Acknowledgements

The authors are grateful to the Narcotics Bureau for providing the data for analysis.

Author information

Authors and Affiliations

School of Economics and Statistics, Guangzhou University, Guangzhou, 510006, China
Haoqi Li & Yuan Li
School of Mathematics and Statistics, Yangtze Normal University, Chongqing, 408100, China
Haoqi Li

Authors

Haoqi Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuan Li.

Additional information

This research was supported by the National Natural Science Foundation of China under Grant Nos. 11731015, 11571148, and the Natural Science Foundation of Chongqing under Grant No. cstc2019jcyj-msxmX0709, the Science and Technology Research Program of Chongqing Municipal Education Commission under Grant No. KJQN201901436.

This paper was recommended for publication by Editor LI Qizhai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, H., Li, Y. Estimating the Size of an Open Population with Massive Datasets Based on a Generalized Varying-Coefficient Model. J Syst Sci Complex 35, 1116–1136 (2022). https://doi.org/10.1007/s11424-021-0224-z

Download citation

Received: 17 September 2020
Revised: 07 January 2021
Published: 23 October 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11424-021-0224-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Estimating the Size of an Open Population with Massive Datasets Based on a Generalized Varying-Coefficient Model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Discussion of “The power of monitoring: how to make the most of a contaminated multivariate sample” by Andrea Cerioli, Marco Riani, Anthony C. Atkinson and Aldo Corbellini

Statistical Leveraging Methods in Big Data

Adaptive quantile regressions for massive datasets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Estimating the Size of an Open Population with Massive Datasets Based on a Generalized Varying-Coefficient Model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Discussion of “The power of monitoring: how to make the most of a contaminated multivariate sample” by Andrea Cerioli, Marco Riani, Anthony C. Atkinson and Aldo Corbellini

Statistical Leveraging Methods in Big Data

Adaptive quantile regressions for massive datasets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation