Log in

Estimating the Size of an Open Population with Massive Datasets Based on a Generalized Varying-Coefficient Model

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

A generalized varying-coefficient model is proposed to estimate a population size at a specific time from multiple lists of an open population. The research datasets have millions of records with a very long time span (38 years), bringing challenges to calculations. The authors develop a regularization iterative algorithm to overcome this difficulty. The asymptotic distribution of the proposed estimators is derived. Simulation studies show that the procedure works well. The method is applied to estimate the number of drug abusers in Hong Kong, China over the period 1977–2014.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cormack R, Log-linear models for capture-recapture, Biometrics, 1989, 45: 395–413.

    Article  Google Scholar 

  2. Fienberg S, The multiple recapture census for closed population and incomplete 2k contingency tables, Biometrika, 1972, 59: 591–603.

    MathSciNet  MATH  Google Scholar 

  3. Lin H, Yip P, and Chen F, Estimating the population size for a multiple list problem with an open population, Statistica Sinica, 2009, 19: 177–196.

    MathSciNet  MATH  Google Scholar 

  4. International Working Group for Disease Monitoring and Forecasting, Capture-recapture and multiple-record systems estimation, I: History and theoretical development, Am. J. Epidemiol, 1995a, 142: 1047–1058.

    Article  Google Scholar 

  5. International Working Group for Disease Monitoring and Forecasting. Capture-recapture and multiple-record systems estimation, II: Applications in human diseases, Am. J. Epidemiol, 1995b, 142: 1059–1068.

    Article  Google Scholar 

  6. Cormack R and Jupp P, Inference for Poisson and multinomial models for capture-recapture experiments, Biometrika, 1991, 78: 911–916.

    Article  MathSciNet  Google Scholar 

  7. Chao A and Lee S, Estimating the number of classes via sample coverage, J. Amer. Statist. Assoc, 1992, 87: 210–217.

    Article  MathSciNet  Google Scholar 

  8. Huggins R and Yip P, Estimation of the size of the open population from capture-recapture data using weighted martingale methods, Biometrics, 1999, 55: 387–395.

    Article  Google Scholar 

  9. Huggins R, Yang H, Chao A, et al., Population size estimation using local sample coverage for open populations, J. Statist. Plann. Inference, 2003, 113: 699–714.

    Article  MathSciNet  Google Scholar 

  10. Yang H, Huggins R, and Clark A, Estimation of the size of an open population using local estimating equations II: A partially parametric approach, Biometrics, 2003, 59: 365–374.

    Article  MathSciNet  Google Scholar 

  11. Alho J, Logistic regression in capture-recapture models, Biometrics, 1990, 46: 623–635.

    Article  MathSciNet  Google Scholar 

  12. Huggins R, On the statistical analysis of capture experiments, Biometrika, 1989, 76: 133–140.

    Article  MathSciNet  Google Scholar 

  13. Zwane E and Van Der Heijden P, Semiparametric models for capture-recapture studies with covariates, Computational Statistics & Data Analysis, 2004, 47: 729–743.

    Article  MathSciNet  Google Scholar 

  14. Hwang W and Huggins R, A semiparametric model for a functional behavioural response to capture in capture-recapture experiments, Australian & New Zealand Journal of Statistics, 2011, 53: 191–202.

    Article  MathSciNet  Google Scholar 

  15. Stoklosa J and Huggins R, A robust P-spline approach to closed population capture-recapture models with time dependence and heterogeneity, Computational Statistics & Data Analysis, 2012, 56: 408–417.

    Article  MathSciNet  Google Scholar 

  16. Huggins R, Yip P, and Stoklosa J, Nonparametric estimation of the size of an open population from repeated multiple list, Australian & New Zealand Journal of Statistics, 2016, 58: 1–13.

    Article  MathSciNet  Google Scholar 

  17. Huggins R, Stoklosa J, Roach C, et al., Estimating the size of an open population using sparse capture-recapture data, Biometrics, 2018, 74: 280–288.

    Article  MathSciNet  Google Scholar 

  18. Stoklosa J, Hwang W, Yip P, et al., Accounting for contamination and outliers in covariates for open population capture-recapture models, Journal of Statistical Planning and Inference, 2016, 176: 52–63.

    Article  MathSciNet  Google Scholar 

  19. Li H, Lin H, Yip P, et al., Estimating population size of heterogeneous populations with large data sets and a large number of parameters, Computational Statistics & Data Analysis, 2019, 139: 34–44.

    Article  MathSciNet  Google Scholar 

  20. Chen K, Parametric and semiparametric models for recapture and removal studies: A likelihood approach, J. R. Statist. Soc. B, 2001, 63: 607–619.

    Article  MathSciNet  Google Scholar 

  21. Gray R, Flexible methods for analyzing survival data using splines, with application to breast cancer prognosis, Journal of the American Statistical Association, 1992, 87, 942–951.

    Article  Google Scholar 

  22. Michelot T, Langrock R, Kneib T, et al., Maximum penalized likelihood estimation in semiparametric mark-recapture-recovery models, Biometrical Journal, 2016, 58: 222–239.

    Article  MathSciNet  Google Scholar 

  23. Lehmann E, Elements of Large-Sample Theory, Springer, New York, 1999.

    Book  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the Narcotics Bureau for providing the data for analysis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Li.

Additional information

This research was supported by the National Natural Science Foundation of China under Grant Nos. 11731015, 11571148, and the Natural Science Foundation of Chongqing under Grant No. cstc2019jcyj-msxmX0709, the Science and Technology Research Program of Chongqing Municipal Education Commission under Grant No. KJQN201901436.

This paper was recommended for publication by Editor LI Qizhai.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Li, Y. Estimating the Size of an Open Population with Massive Datasets Based on a Generalized Varying-Coefficient Model. J Syst Sci Complex 35, 1116–1136 (2022). https://doi.org/10.1007/s11424-021-0224-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-021-0224-z

Keywords

Navigation