Abstract
Félix-Medina and Thompson proposed a variant of link-tracing sampling to estimate the size of a hidden population such as drug users or sexual workers. In their variant a sampling frame of sites where the members of the population tend to gather is constructed. The frame is not assumed to cover the whole population, but only a portion of it. A simple random sample of sites is selected; the people in the sampled sites are identified and are asked to name other members of the population, who are added to the sample. Those authors proposed maximum likelihood estimators (MLEs) of the population size that derived from a multinomial model for the numbers of people found in the sampled sites and a model that considers that the probability that a person is named by any element in a particular sampled site (link-probability) does not depend on the named person, that is, that the probabilities are homogeneous. Later, Félix-Medina et al. proposed unconditional and conditional MLEs of the population size, which derived from a model that takes into account the heterogeneity of the link-probabilities. In this work we consider this sampling design and set conditions for a general model for the link-probabilities that guarantees the consistency and asymptotic normality of the estimators of the population size and of the estimators of the parameters of the model for the link-probabilities. We showed that the unconditional and conditional MLEs of the population size are consistent, that they have different asymptotic normal distributions, and that the unconditional ones are more efficient than the conditional ones.
Similar content being viewed by others
References
Agresti, A. 2002. Categorical data analysis, 2nd ed. New York, NY: Wiley.
Birch, M. W. 1964. A new proof of the Pearson-Fisher theorem. Annals of Mathematical Statistics 35:817–24. doi:10.1214/aoms/1177703581.
Bishop, Y. M. M., S. E. Fienberg, and P. W. Holland. 1975. Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press.
Coull, B. A., and A. Agresti. 1999. The use of mixed logit models to reflect heterogeneity in capture-recapture studies. Biometrics 55:294–301. doi:10.1111/biom.1999.55.issue-1.
Ding, Y. 1996. On the asymptotic normality of multinomial population size estimators with application to backcalculation of AIDS epidemic. Biometrika 83:695–99. doi:10.1093/biomet/83.3.695.
Farcomeni, A., and L. Tardella. 2012. Identifiability and inferential issues in capture-recapture experiments with heterogeneous detection probabilities. Electronic Journal of Statistics 6:2602–26. doi:10.1214/12-EJS758.
Félix-Medina, M. H., P. E. Monjardin, and A. N. Aceves-Castro. 2015. Combining link-tracing sampling and cluster sampling to estimate the size of a hidden population in presence of heterogeneous link-probabilities. Survey Methodology 41:349–76.
Félix-Medina, M. H., and S. K. Thompson. 2004. Combining cluster sampling and link-tracing sampling to estimate the size of hidden populations. Journal of Official Statistics 20:19–38.
Feller, W. 1968. An introduction to probability theory and its applications, Vol. 1, 3rd ed. New York, NY: Wiley.
Fewster, R. M., and P. E. Jupp. 2009. Inference on population size in binomial detectability models. Biometrika 96:805–20. doi:10.1093/biomet/asp051.
Harville, D. A. 1997. Matrix Algebra from a statistician’s perspective. New York, NY: Springer.
Holzmann, H., A. Munk, and W. Zucchini. 2006. On identifiability in capture-recapture models. Biometrics 62:934–36. doi:10.1111/j.1541-0420.2006.00637_1.x.
Johnston, L. G., and K. Sabin. 2010. Sampling hard-to-reach populations with respondent driven sampling. Methodological Innovations Online 5 (2):38.1–48. doi:10.4256/mio.2010.0017.
Kalton, G. 2009. Methods for oversampling rare populations in social surveys. Survey Methodology 35:125–41.
Link, W. A. 2003. Nonidentifiability of population size from capture-recapture data with heterogeneous detection probabilities. Biometrics 59:1123–30. doi:10.1111/biom.2003.59.issue-4.
Magnani, R. K., K. Sabin, T. Saidel, and D. Heckathorn. 2005. Review of sampling hard-to-reach populations for HIV surveillance. AIDS 19:S67–S72. doi:10.1097/01.aids.0000172879.20628.e1.
Rao, C. R. 1958. Maximum likelihood estimation for the multinomial distribution with infinite number of cells. Sankhyā: the Indian Journal of Statistics 20:211–18.
Rao, C. R. 1973. Linear statistical inference and its applications, 2nd ed. New York, NY: Wiley.
Sanathanan, L. 1972. Estimating the size of a multinomial population. Annals of Mathematical Statistics 43:142–52. doi:10.1214/aoms/1177692709.
Serfling, R. J. 1980. Approximation theorems of mathematical statistics. New York, NY: Wiley.
Serfling, R. J. 2011. Asymptotic relative efficiency in estimation. In International encyclopedia of statistical science, ed. M. Lovric, 68–72. Berlin, Germany: Springer.
Spreen, M. 1992. Rare populations, hidden populations and link-tracing designs: What and why? Bulletin De Méthodologie Sociologique 36:34–58. doi:10.1177/075910639203600103.
Thompson, S. K., and O. Frank. 2000. Model-based estimation with link-tracing sampling designs. Survey Methodology 26:87–98.
Varadhan, S. R. S. 2008. Large deviations. Annals of Probability 36:397–419. doi:10.1214/07-AOP348.
Funding
This work was supported by Universidad Autónoma de Sinaloa: PIFI-2013-25-73-1.4.3-8.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Félix-Medina, M.H. Combining cluster sampling and link-tracing sampling to estimate the size of a hidden population: Asymptotic properties of the estimators. J Stat Theory Pract 12, 463–496 (2018). https://doi.org/10.1080/15598608.2017.1405374
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1080/15598608.2017.1405374
Keywords
- Asymptotic normality
- capture-recapture
- chain-referral sampling
- hard-to-detect population
- maximum likelihood estimator
- snowball sampling