Log in

Assessing the Impact of Differential Privacy on Population Uniques in Geographically Aggregated Data: The Case of the 2020 U.S. Census

  • Original Research
  • Published:
Population Research and Policy Review Aims and scope Submit manuscript

Abstract

Geographically aggregated demographic, social, and economic data are valuable for research and practical applications, but their use and sharing often compromise individual privacy. The U.S. Census Bureau has responded to this issue by introducing a new privacy protection method, the TopDown Algorithm (TDA), in the 2020 Census. The TDA is based on a privacy definition known as differential privacy and is primarily designed to reduce the risk of reconstruction-abetted disclosure, a type of privacy violation where individual identities can be revealed by reconstructing confidential census responses and linking them to publicly available data. However, there still lacks a systematic exploration of the impact of the TDA on direct disclosure, another common type of privacy violation where individuals can be directly distinguished from public census tables to reveal their identities. To address this gap, this paper examines the effectiveness of the TDA in protecting against direct disclosure by focusing on how information from public census tables can be used to distinguish population uniques, the individuals that can be uniquely distinguished from census tables. Our study reveals that while the TDA provides a reasonable level of differential privacy, it does not necessarily prevent the direct identification of population uniques using public census tables. Our finding is crucial for policymakers to consider when making informed decisions regarding parameter selection for the TDA during its implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The data and code supporting the findings of this study are openly accessible on GitHub at https://github.com/linyuehzzz/tda-eval and on Figshare at https://doi.org/10.6084/m9.figshare.23553096.

Notes

  1. 13 U.S.C. §9.

  2. Pub. L. No. 104-191.

  3. CA Civ Code §1798.192 (2018).

  4. The conclusions vis-à-vis the total \(\rho\) budget are applicable exclusively given the specific parameters.

  5. HVER is exactly the DETAILED query in the redistricting code base (Abowd et al., 2022).

  6. Baldrige v. Shapiro, 455 U.S. 345 (1982).

References

  • Abowd, J. (2018). The US census bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2867–2867.

  • Abowd, J. (2021). 2010 Supplemental Declaration of John M. Abowd, State of Alabama v. United States Department of Commerce. Case No. 3:21-CV-211-RAH-ECM-KCN.

  • Abowd, J., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., & Zhuravlev, P. (2022). The 2020 census disclosure avoidance system TopDown algorithm. Harvard Data Science Review, (Special Issue 2).

  • Abowd, J., & Hawes, M. (2023). Confidentiality protection in the 2020 US Census of population and housing. Annual Review of Statistics and Its Application, 10, 119–144.

    Article  Google Scholar 

  • Bethlehem, J. G., Keller, W. J., & Pannekoek, J. (1990). Disclosure control of microdata. Journal of the American Statistical Association, 85(409), 38–45.

    Article  Google Scholar 

  • Bowen, C., Williams, A. R., & Pickens, M. (2022). Decennial disclosure: An explainer on formal privacy and the topdown algorithm. Urban Institute: Technical report.

  • Bramer, W. M., Giustini, D., & Kramer, B. M. (2016). Comparing the coverage, recall, and precision of searches for 120 systematic reviews in embase, medline, and google scholar: a prospective study. Systematic Reviews, 5, 1–7.

    Article  Google Scholar 

  • Bun, M., & Steinke, T. (2016). Concentrated differential privacy: Simplifications, extensions, and lower bounds. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (pp. 635–658). Springer.

    Google Scholar 

  • Christ, M., Radway, S., & Bellovin, S. M. (2022). Differential privacy and swap**: Examining de-identification’s impact on minority representation and privacy preservation in the US Census. 2022 IEEE symposium on security and privacy (SP) (pp. 1564–1564). IEEE Computer Society.

  • Ding, B., Kulkarni, J., & Yekhanin, S. (2017). Collecting telemetry data privately. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 3574—3583.

  • Dinur, I., & Nissim, K. (2003). Revealing information while preserving privacy. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 202–210.

  • Domingo-Ferrer, J., Sánchez, D., & Blanco-Justicia, A. (2021). The limits of differential privacy (and its misuse in data release and machine learning). Communications of the ACM, 64(7), 33–35.

    Article  Google Scholar 

  • Duncan, G. T., Elliot, M., & Salazar-González, J.-J. (2011). Concepts of statistical disclosure limitation (pp. 27–47). Springer.

    Google Scholar 

  • Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pages 265–284. Springer.

  • Dwork, C., & Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4), 211–407.

    Google Scholar 

  • Dwork, C., Smith, A., Steinke, T., & Ullman, J. (2017). Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application, 4, 61–84.

    Article  Google Scholar 

  • El Emam, K., Brown, A., & AbdelMalik, P. (2009). Evaluating predictors of geographic area population size cut-offs to manage re-identification risk. Journal of the American Medical Informatics Association, 16(2), 256–266.

    Article  Google Scholar 

  • Fefferman, N. H., O’Neil, E. A., & Naumova, E. N. (2005). Confidentiality and confidence: Is data aggregation a means to achieve both? Journal of Public Health Policy, 26, 430–449.

    Article  Google Scholar 

  • Garfinkel, S., Abowd, J., & Martindale, C. (2019). Understanding database reconstruction attacks on public data. Communications of the ACM, 62(3), 46–53.

    Article  Google Scholar 

  • Ghosh, A., & Kleinberg, R. (2016). Inferential privacy guarantees for differentially private mechanisms. Cryptography and Security. https://doi.org/10.48550/ar**v.1603.01508

    Article  Google Scholar 

  • Hawes, M. (2022). Reconstruction and re-identification of the Demographic and Housing Characteristics File (DHC) disclosure avoidance for the 2010 Census.

  • Keller, S. A., & Abowd, J. (2023). Database reconstruction does compromise confidentiality. Proceedings of the National Academy of Sciences, 120(12), e2300976120.

    Article  Google Scholar 

  • Kenny, C. T., Kuriwaki, S., McCartan, C., Rosenman, E. T., Simko, T., & Imai, K. (2021). The use of differential privacy for census data and its impact on redistricting: The case of the 2020 US census. Science Advances, 7(41), eabk3283.

    Article  Google Scholar 

  • Kifer, D., Abowd, J. M., Ashmead, R., Cumings-Menon, R., Leclerc, P., Machanavajjhala, A., Sexton, W., & Zhuravlev, P. (2022). Bayesian and frequentist semantics for common variations of differential privacy: Applications to the 2020 census. Methodology. https://doi.org/10.48550/ar**v.2209.03310

    Article  Google Scholar 

  • Lin, Y. (2023). Privacy and Utility of Geographic Data: Revealing, Evaluating, and Mitigating the Externalities of Geographic Privacy Protection [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1681120766846778.

  • Lin, Y., & ** synthetic individual-level population datasets: The case of contextualizing maps of privacy-preserving census data. In AutoCarto 2022, The 24th International Research Symposium on Cartography and GIScience, Redlands.

  • Lin, Y., & **ao, N. (2023a). A computational framework for preserving privacy and maintaining utility of geographically aggregated data: A stochastic spatial optimization approach. Annals of the American Association of Geographers, 113(5), 1035–1056.

    Article  Google Scholar 

  • Lin, Y., & **ao, N. (2023b). Generating small areal synthetic microdata from public aggregated data using an optimization method. The Professional Geographer.

  • Loi, M., & Christen, M. (2020). Two concepts of group privacy. Philosophy & Technology, 33(2), 207–224.

    Article  Google Scholar 

  • Muralidhar, K. (2022). A re-examination of the census bureau reconstruction and reidentification attack. In International Conference on Privacy in Statistical Databases, pages 312–323. Springer.

  • Muralidhar, K., & Domingo-Ferrer, J. (2023). Database reconstruction is not so easy and is different from reidentification. Cryptography and Security. https://doi.org/10.48550/ar**v.2301.10213

    Article  Google Scholar 

  • Roessler, B. (2004). The value of privacy. Polity Press.

    Google Scholar 

  • Ruggles, S., & Van Riper, D. (2021). The role of chance in the census bureau database reconstruction experiment. Population Research and Policy Review, 41, 781–788.

    Article  Google Scholar 

  • Sánchez, D., Domingo-Ferrer, J., & Muralidhar, K. (2023). Confidence-ranked reconstruction of census records from aggregate statistics fails to capture privacy risks and reidentifiability. Proceedings of the National Academy of Sciences, 120(18), e2303890120.

    Article  Google Scholar 

  • Santos-Lozada, A. R., Howard, J. T., & Verdery, A. M. (2020). How differential privacy will affect our understanding of health disparities in the United States. Proceedings of the National Academy of Sciences, 117(24), 13405–13412.

    Article  Google Scholar 

  • Singer, E., Mathiowetz, N. A., & Couper, M. P. (1993). The impact of privacy and confidentiality concerns on survey participation the case of the 1990 us census. Public Opinion Quarterly, 57(4), 465–482.

    Article  Google Scholar 

  • Singer, E., Van Hoewyk, J., & Neugebauer, R. J. (2003). Attitudes and behavior: The impact of privacy and confidentiality concerns on participation in the 2000 census. Public Opinion Quarterly, 67(3), 368–384.

    Article  Google Scholar 

  • Skinner, C., & Shlomo, N. (2008). Assessing identification risk in survey microdata using log-linear models. Journal of the American Statistical Association, 103(483), 989–1001.

    Article  Google Scholar 

  • Su, L. T. (1994). The relevance of recall and precision in user evaluation. Journal of the American Society for Information Science, 45(3), 207–217.

    Article  Google Scholar 

  • Thakurta, A. G., Vyrros, A. H., Vaishampayan, U. S., Kapoor, G., Freudiger, J., Sridhar, V. R., & Davidson, D. (2017). Learning new words. US Patent 9,594,741.

  • U.S. Census Bureau (2011). 2010 united states census summary file 1 dataset.

  • U.S. Census Bureau (2016). Everybody counts!

  • U.S. Census Bureau (2019). The 2020 census and confidentiality.

  • U.S. Census Bureau (2020). DAS 2010 demonstration data products disclosure avoidance system release.

  • U.S. Census Bureau (2021). Privacy-loss budget allocation 2021-06-08.

  • Van Riper, D., Kugler, T., & Ruggles, S. (2020). Disclosure avoidance in the census bureau’s 2010 demonstration data product. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (pp. 353–368). Springer.

    Google Scholar 

  • Winkler, R. L., Butler, J. L., Curtis, K. J., & Egan-Robertson, D. (2021). Differential privacy and the accuracy of county-level net migration estimates. Population Research and Policy Review, 41, 417–435.

    Article  Google Scholar 

  • Young, C., Martin, D., & Skinner, C. (2009). Geographically intelligent disclosure control for flexible aggregation of census data. International Journal of Geographical Information Science, 23(4), 457–482.

    Article  Google Scholar 

  • Zayatz, L. (2007). Disclosure avoidance practices and research at the us census bureau: An update. Journal of Official Statistics, 23(2), 253.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Lin.

Ethics declarations

Conflicts of interest

The authors report that there are no competing interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, Y., **ao, N. Assessing the Impact of Differential Privacy on Population Uniques in Geographically Aggregated Data: The Case of the 2020 U.S. Census. Popul Res Policy Rev 42, 81 (2023). https://doi.org/10.1007/s11113-023-09829-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11113-023-09829-4

Keywords

Navigation