Self-adaptive Privacy Concern Detection for User-Generated Content

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2018)

Abstract

To protect user privacy in data analysis, a state-of-the-art strategy is differential privacy in which scientific noise is injected into the real analysis output. The noise masks individual’s sensitive information contained in the dataset. However, determining the amount of noise is a key challenge, since too much noise will destroy data utility while too little noise will increase privacy risk. Though previous research works have designed some mechanisms to protect data privacy in different scenarios, most of the existing studies assume uniform privacy concerns for all individuals. Consequently, putting an equal amount of noise to all individuals leads to insufficient privacy protection for some users, while over-protecting others. To address this issue, we propose a self-adaptive approach for privacy concern detection based on user personality. Our experimental studies demonstrate the effectiveness to address a suitable personalized privacy protection for cold-start users (i.e., without their privacy-concern information in training data).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 67.40
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 84.39
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.imdb.com/.

  2. 2.

    http://myPersonality.org.

  3. 3.

    https://code.google.com/archive/p/word2vec/.

References

  1. Papernot, N., McDaniel, P.D., Sinha, A., Wellman, M.P.: Towards the science of security and privacy in machine learning. CoRR (2016)

    Google Scholar 

  2. McKenzie, P.J., Burkell, J., Wong, L., Whippey, C., Trosow, S.E., McNally, M.B.: User-generated online content 1: overview, current state and context. First Monday 17, 4–6 (2012)

    Google Scholar 

  3. Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: Proceedings of the 2008 IEEE Symposium on Security and Privacy, pp. 111–125. SP 2008 (2008)

    Google Scholar 

  4. Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333. CCS 2015 (2015)

    Google Scholar 

  5. Pennebaker, J.W., King, L.A.: Linguistic styles: language use as an individual difference. J. Personal. Soc. Psychol. 77, 1296–1312 (1999)

    Article  Google Scholar 

  6. Flekova, L., Gurevych, I.: Personality profiling of fictional characters using sense-level links between lexical resources. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1805–1816 (2015)

    Google Scholar 

  7. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. CoRR (2010)

    Google Scholar 

  8. Flekova, L., Gurevych, I.: Can we hide in the web? Large scale simultaneous age and gender author profiling in social media notebook for PAN at CLEF 2013. In: Working Notes for CLEF 2013 Conference, Valencia, Spain, 23–26 September 2013 (2013)

    Google Scholar 

  9. Vu, X.S., Jiang, L., Brändström, A., Elmroth, E.: Personality-based knowledge extraction for privacy-preserving data analysis. In: Proceedings of the Knowledge Capture Conference, pp. 45:1–45:4. K-CAP 2017 (2017)

    Google Scholar 

  10. Cynthia, D.: Differential privacy. In: ICALP, pp. 1–12 (2006)

    Google Scholar 

  11. Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Int. Res. 30, 457–500 (2007)

    Google Scholar 

  12. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE, pp. 217–228 (2005)

    Google Scholar 

  13. Wang, R., Wang, X., Li, Z., Tang, H., Reiter, M.K., Dong, Z.: Privacy-preserving genomic computation through program specialization. In: CCS , pp. 338–347(2009)

    Google Scholar 

  14. McSherry, F.D.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: SIGMOD (2009)

    Google Scholar 

  15. Mohan, P., Thakurta, A., Shi, E., Song, D., Culler, D.: GUPT: privacy preserving data analysis made easy. In: SIGMOD (2012)

    Google Scholar 

  16. Ebadi, H., Sands, D., Schneider, G.: Differential privacy: now it’s getting personal. In: Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. POPL 2015, pp. 69–81 (2015)

    Google Scholar 

  17. Jorgensen, Z., Yu, T., Cormode, G.: Conservative or liberal? Personalized differential privacy. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 1023–1034 (2015)

    Google Scholar 

  18. Sumner, C., Byers, A., Shearing, M.: Determining personality traits and privacy concerns from Facebook activity. In: Black Hat Briefings, pp. 197–221 (2011)

    Google Scholar 

  19. John, O.P., Srivastava, S.: The big five trait taxonomy: History, measurement, and theoretical perspectives. In: Handbook of Personality: Theory and Research, pp. 102–138 (1999)

    Google Scholar 

  20. Murtagh, F.: Multilayer perceptrons for classification and regression. Neurocomputing 2(5), 183–197 (1991)

    Article  Google Scholar 

  21. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)

    Google Scholar 

  22. Rumelhart, D.E., Durbin, R., Golden, R., Chauvin, Y.: Backpropagation, pp. 1–34 (1995)

    Google Scholar 

  23. Abadi, M., et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/

  24. Costa, P.T., McCrae, R.R.: The Revised NEO Personality Inventory (NEO-PI-R), pp. 179–198 (2008)

    Google Scholar 

  25. Majumder, N., Poria, S., Gelbukh, A., Cambria, E.: Deep learning-based document modeling for personality detection from text. IEEE Intell. Syst. 32, 74–79 (2017)

    Article  Google Scholar 

  26. Farnadi, G., Zoghbi, S., Moens, M.: Cock. Recognising personality traits using Facebook status updates, M.D., pp. 14–18 (2013)

    Google Scholar 

  27. Farnadi, G., et al.: Computational personality recognition in social media. User Model. User-Adapt. Interact. 26, 109–142 (2016)

    Article  Google Scholar 

  28. Vu, X.S., Flekova, L., Jiang, L., Gurevych, I.: Lexical-semantic resources: yet powerful resources for automatic personality classification. In: Proceedings of the 9th Global WordNet Conference (2018)

    Google Scholar 

  29. Vu, T., Nguyen, D.Q., Vu, X.S., Nguyen, D.Q., Trenell, M.: Nihrio at semeval-2018 task 3: a simple and accurate neuralnetwork model for irony detection in twitter. In: Proceedings of the 12nd International Workshop on Semantic Evaluation (SemEval-2018), pp. 525–530. Association for Computational Linguistics (2018)

    Google Scholar 

  30. Bowyer, K.W., Chawla, N.V., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2011)

    Google Scholar 

  31. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuan-Son Vu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vu, XS., Jiang, L. (2023). Self-adaptive Privacy Concern Detection for User-Generated Content. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13396. Springer, Cham. https://doi.org/10.1007/978-3-031-23793-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23793-5_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23792-8

  • Online ISBN: 978-3-031-23793-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation