Abstract
Geographically aggregated demographic, social, and economic data are valuable for research and practical applications, but their use and sharing often compromise individual privacy. The U.S. Census Bureau has responded to this issue by introducing a new privacy protection method, the TopDown Algorithm (TDA), in the 2020 Census. The TDA is based on a privacy definition known as differential privacy and is primarily designed to reduce the risk of reconstruction-abetted disclosure, a type of privacy violation where individual identities can be revealed by reconstructing confidential census responses and linking them to publicly available data. However, there still lacks a systematic exploration of the impact of the TDA on direct disclosure, another common type of privacy violation where individuals can be directly distinguished from public census tables to reveal their identities. To address this gap, this paper examines the effectiveness of the TDA in protecting against direct disclosure by focusing on how information from public census tables can be used to distinguish population uniques, the individuals that can be uniquely distinguished from census tables. Our study reveals that while the TDA provides a reasonable level of differential privacy, it does not necessarily prevent the direct identification of population uniques using public census tables. Our finding is crucial for policymakers to consider when making informed decisions regarding parameter selection for the TDA during its implementation.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11113-023-09829-4/MediaObjects/11113_2023_9829_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11113-023-09829-4/MediaObjects/11113_2023_9829_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11113-023-09829-4/MediaObjects/11113_2023_9829_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11113-023-09829-4/MediaObjects/11113_2023_9829_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11113-023-09829-4/MediaObjects/11113_2023_9829_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11113-023-09829-4/MediaObjects/11113_2023_9829_Fig6_HTML.png)
Similar content being viewed by others
Data Availability
The data and code supporting the findings of this study are openly accessible on GitHub at https://github.com/linyuehzzz/tda-eval and on Figshare at https://doi.org/10.6084/m9.figshare.23553096.
Notes
13 U.S.C. §9.
Pub. L. No. 104-191.
CA Civ Code §1798.192 (2018).
The conclusions vis-à-vis the total \(\rho\) budget are applicable exclusively given the specific parameters.
HVER is exactly the DETAILED query in the redistricting code base (Abowd et al., 2022).
Baldrige v. Shapiro, 455 U.S. 345 (1982).
References
Abowd, J. (2018). The US census bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2867–2867.
Abowd, J. (2021). 2010 Supplemental Declaration of John M. Abowd, State of Alabama v. United States Department of Commerce. Case No. 3:21-CV-211-RAH-ECM-KCN.
Abowd, J., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., & Zhuravlev, P. (2022). The 2020 census disclosure avoidance system TopDown algorithm. Harvard Data Science Review, (Special Issue 2).
Abowd, J., & Hawes, M. (2023). Confidentiality protection in the 2020 US Census of population and housing. Annual Review of Statistics and Its Application, 10, 119–144.
Bethlehem, J. G., Keller, W. J., & Pannekoek, J. (1990). Disclosure control of microdata. Journal of the American Statistical Association, 85(409), 38–45.
Bowen, C., Williams, A. R., & Pickens, M. (2022). Decennial disclosure: An explainer on formal privacy and the topdown algorithm. Urban Institute: Technical report.
Bramer, W. M., Giustini, D., & Kramer, B. M. (2016). Comparing the coverage, recall, and precision of searches for 120 systematic reviews in embase, medline, and google scholar: a prospective study. Systematic Reviews, 5, 1–7.
Bun, M., & Steinke, T. (2016). Concentrated differential privacy: Simplifications, extensions, and lower bounds. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (pp. 635–658). Springer.
Christ, M., Radway, S., & Bellovin, S. M. (2022). Differential privacy and swap**: Examining de-identification’s impact on minority representation and privacy preservation in the US Census. 2022 IEEE symposium on security and privacy (SP) (pp. 1564–1564). IEEE Computer Society.
Ding, B., Kulkarni, J., & Yekhanin, S. (2017). Collecting telemetry data privately. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 3574—3583.
Dinur, I., & Nissim, K. (2003). Revealing information while preserving privacy. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 202–210.
Domingo-Ferrer, J., Sánchez, D., & Blanco-Justicia, A. (2021). The limits of differential privacy (and its misuse in data release and machine learning). Communications of the ACM, 64(7), 33–35.
Duncan, G. T., Elliot, M., & Salazar-González, J.-J. (2011). Concepts of statistical disclosure limitation (pp. 27–47). Springer.
Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pages 265–284. Springer.
Dwork, C., & Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4), 211–407.
Dwork, C., Smith, A., Steinke, T., & Ullman, J. (2017). Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application, 4, 61–84.
El Emam, K., Brown, A., & AbdelMalik, P. (2009). Evaluating predictors of geographic area population size cut-offs to manage re-identification risk. Journal of the American Medical Informatics Association, 16(2), 256–266.
Fefferman, N. H., O’Neil, E. A., & Naumova, E. N. (2005). Confidentiality and confidence: Is data aggregation a means to achieve both? Journal of Public Health Policy, 26, 430–449.
Garfinkel, S., Abowd, J., & Martindale, C. (2019). Understanding database reconstruction attacks on public data. Communications of the ACM, 62(3), 46–53.
Ghosh, A., & Kleinberg, R. (2016). Inferential privacy guarantees for differentially private mechanisms. Cryptography and Security. https://doi.org/10.48550/ar**v.1603.01508
Hawes, M. (2022). Reconstruction and re-identification of the Demographic and Housing Characteristics File (DHC) disclosure avoidance for the 2010 Census.
Keller, S. A., & Abowd, J. (2023). Database reconstruction does compromise confidentiality. Proceedings of the National Academy of Sciences, 120(12), e2300976120.
Kenny, C. T., Kuriwaki, S., McCartan, C., Rosenman, E. T., Simko, T., & Imai, K. (2021). The use of differential privacy for census data and its impact on redistricting: The case of the 2020 US census. Science Advances, 7(41), eabk3283.
Kifer, D., Abowd, J. M., Ashmead, R., Cumings-Menon, R., Leclerc, P., Machanavajjhala, A., Sexton, W., & Zhuravlev, P. (2022). Bayesian and frequentist semantics for common variations of differential privacy: Applications to the 2020 census. Methodology. https://doi.org/10.48550/ar**v.2209.03310
Lin, Y. (2023). Privacy and Utility of Geographic Data: Revealing, Evaluating, and Mitigating the Externalities of Geographic Privacy Protection [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1681120766846778.
Lin, Y., & ** synthetic individual-level population datasets: The case of contextualizing maps of privacy-preserving census data. In AutoCarto 2022, The 24th International Research Symposium on Cartography and GIScience, Redlands.
Lin, Y., & **ao, N. (2023a). A computational framework for preserving privacy and maintaining utility of geographically aggregated data: A stochastic spatial optimization approach. Annals of the American Association of Geographers, 113(5), 1035–1056.
Lin, Y., & **ao, N. (2023b). Generating small areal synthetic microdata from public aggregated data using an optimization method. The Professional Geographer.
Loi, M., & Christen, M. (2020). Two concepts of group privacy. Philosophy & Technology, 33(2), 207–224.
Muralidhar, K. (2022). A re-examination of the census bureau reconstruction and reidentification attack. In International Conference on Privacy in Statistical Databases, pages 312–323. Springer.
Muralidhar, K., & Domingo-Ferrer, J. (2023). Database reconstruction is not so easy and is different from reidentification. Cryptography and Security. https://doi.org/10.48550/ar**v.2301.10213
Roessler, B. (2004). The value of privacy. Polity Press.
Ruggles, S., & Van Riper, D. (2021). The role of chance in the census bureau database reconstruction experiment. Population Research and Policy Review, 41, 781–788.
Sánchez, D., Domingo-Ferrer, J., & Muralidhar, K. (2023). Confidence-ranked reconstruction of census records from aggregate statistics fails to capture privacy risks and reidentifiability. Proceedings of the National Academy of Sciences, 120(18), e2303890120.
Santos-Lozada, A. R., Howard, J. T., & Verdery, A. M. (2020). How differential privacy will affect our understanding of health disparities in the United States. Proceedings of the National Academy of Sciences, 117(24), 13405–13412.
Singer, E., Mathiowetz, N. A., & Couper, M. P. (1993). The impact of privacy and confidentiality concerns on survey participation the case of the 1990 us census. Public Opinion Quarterly, 57(4), 465–482.
Singer, E., Van Hoewyk, J., & Neugebauer, R. J. (2003). Attitudes and behavior: The impact of privacy and confidentiality concerns on participation in the 2000 census. Public Opinion Quarterly, 67(3), 368–384.
Skinner, C., & Shlomo, N. (2008). Assessing identification risk in survey microdata using log-linear models. Journal of the American Statistical Association, 103(483), 989–1001.
Su, L. T. (1994). The relevance of recall and precision in user evaluation. Journal of the American Society for Information Science, 45(3), 207–217.
Thakurta, A. G., Vyrros, A. H., Vaishampayan, U. S., Kapoor, G., Freudiger, J., Sridhar, V. R., & Davidson, D. (2017). Learning new words. US Patent 9,594,741.
U.S. Census Bureau (2011). 2010 united states census summary file 1 dataset.
U.S. Census Bureau (2016). Everybody counts!
U.S. Census Bureau (2019). The 2020 census and confidentiality.
U.S. Census Bureau (2020). DAS 2010 demonstration data products disclosure avoidance system release.
U.S. Census Bureau (2021). Privacy-loss budget allocation 2021-06-08.
Van Riper, D., Kugler, T., & Ruggles, S. (2020). Disclosure avoidance in the census bureau’s 2010 demonstration data product. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (pp. 353–368). Springer.
Winkler, R. L., Butler, J. L., Curtis, K. J., & Egan-Robertson, D. (2021). Differential privacy and the accuracy of county-level net migration estimates. Population Research and Policy Review, 41, 417–435.
Young, C., Martin, D., & Skinner, C. (2009). Geographically intelligent disclosure control for flexible aggregation of census data. International Journal of Geographical Information Science, 23(4), 457–482.
Zayatz, L. (2007). Disclosure avoidance practices and research at the us census bureau: An update. Journal of Official Statistics, 23(2), 253.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors report that there are no competing interests to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lin, Y., **ao, N. Assessing the Impact of Differential Privacy on Population Uniques in Geographically Aggregated Data: The Case of the 2020 U.S. Census. Popul Res Policy Rev 42, 81 (2023). https://doi.org/10.1007/s11113-023-09829-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11113-023-09829-4