Abstract
Recent study shows that supposedly anonymous movie rating records are de-identified by using a little auxiliary information. In this chapter, we study a problem of protecting privacy of individuals in large public survey rating data. Such rating data usually contains both ratings of sensitive and non-sensitive issues, and the ratings of sensitive issues belong to personal privacy. Even when survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. To amend this, in this chapter, we propose a novel (k, ε, l)-anonymity model to protect privacy in large survey rating data, in which each survey record is required to be “similar” with at least k – 1 others based on the non-sensitive ratings, where the similarity is controlled by ε, and the standard deviation of sensitive ratings is at least l. We study an interesting yet nontrivial satisfaction problem of the proposed model, which is to decide whether a survey rating data set satisfies the privacy requirements given by the user. For this problem, we investigate its inherent properties theoretically, and devise a novel slice technique to solve it. We discuss the idea of how to anonymize data by using the result of satisfaction problem. Finally, we conduct extensive experiments on two real-life data sets, and the results show that the slicing technique is fast and scalable with data size and much more efficient in terms of execution time and space overhead than the heuristic pairwise method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C.: On k-Anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 901–909 (2005)
Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)
Agrawal, D., Aggarwal, C.C.: On The Design and Qualification of Privacy Preserving Data Mining Algorithm. In: Proc. Symosium on Principles of Database Systems (PODS), pp. 247–255 (2001)
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Anonymity preserving pattern discovery. The International Journal on Very Large Data Bases 17(4), 703–727 (2008)
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Blocking anonymity threats raised by frequent itemset mining. In: Fifth IEEE International Conference on Data Mining, pp. 27–30 (2005)
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: k-anonymous patterns. In: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 10–21 (2005)
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymisation. In: Proceedings of 21st International Conference on Data Engineering, pp. 217–228 (2005)
Backstrom, L., Dwork, C., Kleinberg, J.: Wherefore Art Thou R3579x?: Anonymized Social Networks, Hidden Patterns, and Structural Steganography. In: International World Wide Web Conference, pp. 181–190 (2007)
Evfimievski, R., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 217–228 (2002)
Friedman, J.K., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time, ACM Trans. on Math. Software 3, 209–226 (1977)
Frankowski, D., Cosley, D., Sen, S., Terveen, L.G., Riedl, J.: You are what you say: privacy risks of public mentions. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 565–572 (2006)
Fung, B.C., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st International Conference on Data Engineering, pp. 205–216 (2005)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of \(\cal{NP}\)-Completeness. Freeman, New York (1979)
Ghinita, G., Tao, Y., Kalnis, P.: On the Anonymisation of Sparse High-Dimensional Data. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 715–724 (2008)
Hafner, K.: And if you liked the movie, a Netflix contest may reward you handsomely. New York Times, October 2 (2006)
Hansell, S.: AOL removes search data on vast group of web users. New York Times, August 8 (2006)
Hamming, R.W.: Coding and Information Theory. Prentice Hall, Englewood Cliffs (1980)
Iyengar, V.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 279–288 (2002)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49–60 (2005)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, p. 25 (2006)
Li, J., Tao, Y., **ao, X.: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data. In: ACM Conference on Management of Data (SIGMOD), pp. 473–486 (2008)
Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: Privacy Beyond k-anonymity and l-diversity. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 106–115 (2007)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: Privacy beyond k-anonymity. In: 22nd International Conference on Data Engineering, p. 22 (2006)
Narayanan, A., Shmatikov, V.: Robust De-anonymisation of Large Sparse Datasets. In: IEEE Symposium on In Security and Privacy, pp. 111–125 (2008)
Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing Information. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, p. 188 (1998)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)
Sun, X., Wang, H., Li, J., Pei, J.: Publishing Anonymous Survey Rating Data. Data Mining and Knowledge Discovery. Springer, Heidelberg (2010) (accepted for publication)
Sun, X., Wang, H., Sun, L.: Extended k-Anonymity Models Against Sensitive Attribute Disclosure. Computer Communication. Elsevier, Amsterdam (2010) (accepted for publication)
Sun, X., Wang, H., Li, J.: Injecting purposes and trust into data anonymization. In: Proceeding of the 18th ACM Conference on Information and knowledge Management, pp. 1541–1544 (2009)
Sweeney, L.: Weaving technology and policy together to maintain confidentiality. J. of Law, Medicine and Ethics 25(2-3) (1997)
Sweeney, L.: k-Anonymity: A Model for Protecting Privacy. International Journal on Uncertainty Fuzziness Knowledge-based Systems 10(5), 557–570 (2002)
Traian, T.M., Bindu, V.: Privacy Protection: p-sensitive k-anonymity Property. In: Proceedings of the 22nd International Conference on Data Engineering Workshops, p. 94 (2006)
Verykios, V.S., Elmagarmid, A.K., Bertino, E., Dasseni, E., Saygin, Y.: Association Rule Hiding. IEEE Transactions on Knowledge and Data Engineering 16(4), 434–447 (2004)
**ao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 139–150 (2006)
Xu, Y., Wang, K., Fu, A.W.-C., Yu, P.S.: Anonymizing Transaction Databases for Publication. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 767–775 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sun, X., Wang, H., Li, J. (2011). Validating Privacy Requirements in Large Survey Rating Data. In: Bessis, N., Xhafa, F. (eds) Next Generation Data Technologies for Collective Computational Intelligence. Studies in Computational Intelligence, vol 352. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20344-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-20344-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20343-5
Online ISBN: 978-3-642-20344-2
eBook Packages: EngineeringEngineering (R0)