Validating Privacy Requirements in Large Survey Rating Data

Sun, **aoxun; Wang, Hua; Li, Jiuyong

doi:10.1007/978-3-642-20344-2_17

**aoxun Sun⁴,
Hua Wang⁵ &
Jiuyong Li⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 352))

688 Accesses

Abstract

Recent study shows that supposedly anonymous movie rating records are de-identified by using a little auxiliary information. In this chapter, we study a problem of protecting privacy of individuals in large public survey rating data. Such rating data usually contains both ratings of sensitive and non-sensitive issues, and the ratings of sensitive issues belong to personal privacy. Even when survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. To amend this, in this chapter, we propose a novel (k, ε, l)-anonymity model to protect privacy in large survey rating data, in which each survey record is required to be “similar” with at least k – 1 others based on the non-sensitive ratings, where the similarity is controlled by ε, and the standard deviation of sensitive ratings is at least l. We study an interesting yet nontrivial satisfaction problem of the proposed model, which is to decide whether a survey rating data set satisfies the privacy requirements given by the user. For this problem, we investigate its inherent properties theoretically, and devise a novel slice technique to solve it. We discuss the idea of how to anonymize data by using the result of satisfaction problem. Finally, we conduct extensive experiments on two real-life data sets, and the results show that the slicing technique is fast and scalable with data size and much more efficient in terms of execution time and space overhead than the heuristic pairwise method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

( $$l^{p_1}, \ldots ,l^{p_n}$$ )-Privacy: privacy preservation models for numerical quasi-identifiers and multiple sensitive attributes

Article 02 January 2021

Ensuring Security and Privacy Preservation for the Publication of Rating Datasets

Article 27 March 2024

Sanitizing and measuring privacy of large sparse datasets for recommender systems

Article 13 July 2019

References

Aggarwal, C.: On k-Anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 901–909 (2005)
Google Scholar
Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)
Google Scholar
Agrawal, D., Aggarwal, C.C.: On The Design and Qualification of Privacy Preserving Data Mining Algorithm. In: Proc. Symosium on Principles of Database Systems (PODS), pp. 247–255 (2001)
Google Scholar
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Anonymity preserving pattern discovery. The International Journal on Very Large Data Bases 17(4), 703–727 (2008)
Article Google Scholar
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Blocking anonymity threats raised by frequent itemset mining. In: Fifth IEEE International Conference on Data Mining, pp. 27–30 (2005)
Google Scholar
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: k-anonymous patterns. In: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 10–21 (2005)
Google Scholar
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymisation. In: Proceedings of 21st International Conference on Data Engineering, pp. 217–228 (2005)
Google Scholar
Backstrom, L., Dwork, C., Kleinberg, J.: Wherefore Art Thou R3579x?: Anonymized Social Networks, Hidden Patterns, and Structural Steganography. In: International World Wide Web Conference, pp. 181–190 (2007)
Google Scholar
Evfimievski, R., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 217–228 (2002)
Google Scholar
Friedman, J.K., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time, ACM Trans. on Math. Software 3, 209–226 (1977)
MATH Google Scholar
Frankowski, D., Cosley, D., Sen, S., Terveen, L.G., Riedl, J.: You are what you say: privacy risks of public mentions. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 565–572 (2006)
Google Scholar
Fung, B.C., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st International Conference on Data Engineering, pp. 205–216 (2005)
Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of $\cal{NP}$-Completeness. Freeman, New York (1979)
Google Scholar
Ghinita, G., Tao, Y., Kalnis, P.: On the Anonymisation of Sparse High-Dimensional Data. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 715–724 (2008)
Google Scholar
Hafner, K.: And if you liked the movie, a Netflix contest may reward you handsomely. New York Times, October 2 (2006)
Google Scholar
Hansell, S.: AOL removes search data on vast group of web users. New York Times, August 8 (2006)
Google Scholar
Hamming, R.W.: Coding and Information Theory. Prentice Hall, Englewood Cliffs (1980)
MATH Google Scholar
Iyengar, V.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 279–288 (2002)
Google Scholar
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49–60 (2005)
Google Scholar
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, p. 25 (2006)
Google Scholar
Li, J., Tao, Y., **ao, X.: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data. In: ACM Conference on Management of Data (SIGMOD), pp. 473–486 (2008)
Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: Privacy Beyond k-anonymity and l-diversity. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 106–115 (2007)
Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: Privacy beyond k-anonymity. In: 22nd International Conference on Data Engineering, p. 22 (2006)
Google Scholar
Narayanan, A., Shmatikov, V.: Robust De-anonymisation of Large Sparse Datasets. In: IEEE Symposium on In Security and Privacy, pp. 111–125 (2008)
Google Scholar
Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing Information. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, p. 188 (1998)
Google Scholar
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)
Google Scholar
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)
Article Google Scholar
Sun, X., Wang, H., Li, J., Pei, J.: Publishing Anonymous Survey Rating Data. Data Mining and Knowledge Discovery. Springer, Heidelberg (2010) (accepted for publication)
Google Scholar
Sun, X., Wang, H., Sun, L.: Extended k-Anonymity Models Against Sensitive Attribute Disclosure. Computer Communication. Elsevier, Amsterdam (2010) (accepted for publication)
Google Scholar
Sun, X., Wang, H., Li, J.: Injecting purposes and trust into data anonymization. In: Proceeding of the 18th ACM Conference on Information and knowledge Management, pp. 1541–1544 (2009)
Google Scholar
Sweeney, L.: Weaving technology and policy together to maintain confidentiality. J. of Law, Medicine and Ethics 25(2-3) (1997)
Google Scholar
Sweeney, L.: k-Anonymity: A Model for Protecting Privacy. International Journal on Uncertainty Fuzziness Knowledge-based Systems 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
Traian, T.M., Bindu, V.: Privacy Protection: p-sensitive k-anonymity Property. In: Proceedings of the 22nd International Conference on Data Engineering Workshops, p. 94 (2006)
Google Scholar
Verykios, V.S., Elmagarmid, A.K., Bertino, E., Dasseni, E., Saygin, Y.: Association Rule Hiding. IEEE Transactions on Knowledge and Data Engineering 16(4), 434–447 (2004)
Article Google Scholar
**ao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 139–150 (2006)
Google Scholar
Xu, Y., Wang, K., Fu, A.W.-C., Yu, P.S.: Anonymizing Transaction Databases for Publication. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 767–775 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Australian Council for Educational Research, Australia
**aoxun Sun
University of Southern Queensland, Australia
Hua Wang
University of South Australia, Australia
Jiuyong Li

Authors

**aoxun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiuyong Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing & Maths, University of Derby, DE22 1GB, Derby, United Kingdom (UK)
Nik Bessis
Dept de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Barcelona, Spain
Fatos Xhafa

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sun, X., Wang, H., Li, J. (2011). Validating Privacy Requirements in Large Survey Rating Data. In: Bessis, N., Xhafa, F. (eds) Next Generation Data Technologies for Collective Computational Intelligence. Studies in Computational Intelligence, vol 352. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20344-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-20344-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20343-5
Online ISBN: 978-3-642-20344-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Validating Privacy Requirements in Large Survey Rating Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

( $$l^{p_1}, \ldots ,l^{p_n}$$ )-Privacy: privacy preservation models for numerical quasi-identifiers and multiple sensitive attributes

Ensuring Security and Privacy Preservation for the Publication of Rating Datasets

Sanitizing and measuring privacy of large sparse datasets for recommender systems

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Validating Privacy Requirements in Large Survey Rating Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

( $$l^{p_1}, \ldots ,l^{p_n}$$ )-Privacy: privacy preservation models for numerical quasi-identifiers and multiple sensitive attributes

Ensuring Security and Privacy Preservation for the Publication of Rating Datasets

Sanitizing and measuring privacy of large sparse datasets for recommender systems

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation