Part of the book series: Studies in Computational Intelligence ((SCI,volume 352))

  • 688 Accesses

Abstract

Recent study shows that supposedly anonymous movie rating records are de-identified by using a little auxiliary information. In this chapter, we study a problem of protecting privacy of individuals in large public survey rating data. Such rating data usually contains both ratings of sensitive and non-sensitive issues, and the ratings of sensitive issues belong to personal privacy. Even when survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. To amend this, in this chapter, we propose a novel (k, ε, l)-anonymity model to protect privacy in large survey rating data, in which each survey record is required to be “similar” with at least k – 1 others based on the non-sensitive ratings, where the similarity is controlled by ε, and the standard deviation of sensitive ratings is at least l. We study an interesting yet nontrivial satisfaction problem of the proposed model, which is to decide whether a survey rating data set satisfies the privacy requirements given by the user. For this problem, we investigate its inherent properties theoretically, and devise a novel slice technique to solve it. We discuss the idea of how to anonymize data by using the result of satisfaction problem. Finally, we conduct extensive experiments on two real-life data sets, and the results show that the slicing technique is fast and scalable with data size and much more efficient in terms of execution time and space overhead than the heuristic pairwise method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Aggarwal, C.: On k-Anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 901–909 (2005)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)

    Google Scholar 

  3. Agrawal, D., Aggarwal, C.C.: On The Design and Qualification of Privacy Preserving Data Mining Algorithm. In: Proc. Symosium on Principles of Database Systems (PODS), pp. 247–255 (2001)

    Google Scholar 

  4. Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Anonymity preserving pattern discovery. The International Journal on Very Large Data Bases 17(4), 703–727 (2008)

    Article  Google Scholar 

  5. Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Blocking anonymity threats raised by frequent itemset mining. In: Fifth IEEE International Conference on Data Mining, pp. 27–30 (2005)

    Google Scholar 

  6. Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: k-anonymous patterns. In: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 10–21 (2005)

    Google Scholar 

  7. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymisation. In: Proceedings of 21st International Conference on Data Engineering, pp. 217–228 (2005)

    Google Scholar 

  8. Backstrom, L., Dwork, C., Kleinberg, J.: Wherefore Art Thou R3579x?: Anonymized Social Networks, Hidden Patterns, and Structural Steganography. In: International World Wide Web Conference, pp. 181–190 (2007)

    Google Scholar 

  9. Evfimievski, R., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 217–228 (2002)

    Google Scholar 

  10. Friedman, J.K., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time, ACM Trans. on Math. Software 3, 209–226 (1977)

    MATH  Google Scholar 

  11. Frankowski, D., Cosley, D., Sen, S., Terveen, L.G., Riedl, J.: You are what you say: privacy risks of public mentions. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 565–572 (2006)

    Google Scholar 

  12. Fung, B.C., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st International Conference on Data Engineering, pp. 205–216 (2005)

    Google Scholar 

  13. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of \(\cal{NP}\)-Completeness. Freeman, New York (1979)

    Google Scholar 

  14. Ghinita, G., Tao, Y., Kalnis, P.: On the Anonymisation of Sparse High-Dimensional Data. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 715–724 (2008)

    Google Scholar 

  15. Hafner, K.: And if you liked the movie, a Netflix contest may reward you handsomely. New York Times, October 2 (2006)

    Google Scholar 

  16. Hansell, S.: AOL removes search data on vast group of web users. New York Times, August 8 (2006)

    Google Scholar 

  17. Hamming, R.W.: Coding and Information Theory. Prentice Hall, Englewood Cliffs (1980)

    MATH  Google Scholar 

  18. Iyengar, V.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 279–288 (2002)

    Google Scholar 

  19. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49–60 (2005)

    Google Scholar 

  20. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, p. 25 (2006)

    Google Scholar 

  21. Li, J., Tao, Y., **ao, X.: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data. In: ACM Conference on Management of Data (SIGMOD), pp. 473–486 (2008)

    Google Scholar 

  22. Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: Privacy Beyond k-anonymity and l-diversity. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 106–115 (2007)

    Google Scholar 

  23. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: Privacy beyond k-anonymity. In: 22nd International Conference on Data Engineering, p. 22 (2006)

    Google Scholar 

  24. Narayanan, A., Shmatikov, V.: Robust De-anonymisation of Large Sparse Datasets. In: IEEE Symposium on In Security and Privacy, pp. 111–125 (2008)

    Google Scholar 

  25. Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing Information. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, p. 188 (1998)

    Google Scholar 

  26. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)

    Google Scholar 

  27. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  28. Sun, X., Wang, H., Li, J., Pei, J.: Publishing Anonymous Survey Rating Data. Data Mining and Knowledge Discovery. Springer, Heidelberg (2010) (accepted for publication)

    Google Scholar 

  29. Sun, X., Wang, H., Sun, L.: Extended k-Anonymity Models Against Sensitive Attribute Disclosure. Computer Communication. Elsevier, Amsterdam (2010) (accepted for publication)

    Google Scholar 

  30. Sun, X., Wang, H., Li, J.: Injecting purposes and trust into data anonymization. In: Proceeding of the 18th ACM Conference on Information and knowledge Management, pp. 1541–1544 (2009)

    Google Scholar 

  31. Sweeney, L.: Weaving technology and policy together to maintain confidentiality. J. of Law, Medicine and Ethics 25(2-3) (1997)

    Google Scholar 

  32. Sweeney, L.: k-Anonymity: A Model for Protecting Privacy. International Journal on Uncertainty Fuzziness Knowledge-based Systems 10(5), 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  33. Traian, T.M., Bindu, V.: Privacy Protection: p-sensitive k-anonymity Property. In: Proceedings of the 22nd International Conference on Data Engineering Workshops, p. 94 (2006)

    Google Scholar 

  34. Verykios, V.S., Elmagarmid, A.K., Bertino, E., Dasseni, E., Saygin, Y.: Association Rule Hiding. IEEE Transactions on Knowledge and Data Engineering 16(4), 434–447 (2004)

    Article  Google Scholar 

  35. **ao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 139–150 (2006)

    Google Scholar 

  36. Xu, Y., Wang, K., Fu, A.W.-C., Yu, P.S.: Anonymizing Transaction Databases for Publication. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 767–775 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sun, X., Wang, H., Li, J. (2011). Validating Privacy Requirements in Large Survey Rating Data. In: Bessis, N., Xhafa, F. (eds) Next Generation Data Technologies for Collective Computational Intelligence. Studies in Computational Intelligence, vol 352. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20344-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20344-2_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20343-5

  • Online ISBN: 978-3-642-20344-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation