Log in

Scalable non-deterministic clustering-based k-anonymization for rich networks

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

In this paper, we tackle the problem of graph anonymization in the context of privacy-preserving social network mining. We present a greedy and non-deterministic algorithm to achieve k-anonymity on labeled and undirected networks. Our work aims to create a scalable algorithm for real-world big networks, which runs in parallel and uses biased randomization for improving the quality of the solutions. We propose new metrics that consider the utility of the clusters from a recommender system point of view. We compare our approach to SaNGreeA, a well-known state-of-the-art algorithm for k-anonymity generalization. Finally, we have performed scalability tests, with up to 160 machines within the Hadoop framework, for anonymizing a real-world dataset with around 830 K nodes and 63 M relationships, demonstrating our method’s utility and practical applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.xing.com.

  2. The Scala source code is available at: https://bitbucket.org/jcasasr/scan-algorithm/.

  3. Available at: http://igraph.org.

References

  1. Adamic, L.A., Glance, N.: The political blogosphere and the 2004 U.S. election. In: LinkKDD’05, pp. 36–43. ACM (2005)

  2. Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  3. Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 10, 2008 (2008)

    Google Scholar 

  4. Byun, J.W., Kamra, A., Bertino, E., Li, N.: Efficient \(k\)-anonymization using clustering techniques. In: International Conference on Database Systems for Advanced Applications (DASFAA), pp. 188–200 (2007)

  5. Cai, B.J., Wang, H.Y., Zheng, H.R., Wang, H.: Evaluation repeated random walks in community detection of social networks. In: ICMLC’10, pp. 1849–1854. IEEE (2010)

  6. Campan, A., Truta, T.M.: Data and structural k-anonymity in social networks. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds.) Privacy, Security, and Trust in KDD, pp. 33–54. Springer, Berlin (2009)

    Chapter  Google Scholar 

  7. Campan, A., Alufaisan, Y., Truta, T.M.: Preserving communities in anonymized social networks. Trans. Data Privacy 8(1), 55–87 (2015)

    Google Scholar 

  8. Casas-Roma, J., Herrera-Joancomartí, J., Torra, V.: Anonymizing graphs: measuring quality for clustering. Knowl. Inf. Syst. 44(3), 507–528 (2014)

    Article  Google Scholar 

  9. Casas-Roma, J., Rousseau, F.: Community-preserving generalization of social networks. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1465–1472. ACM, Paris, France (2015)

  10. Casas-Roma, J., Herrera-Joancomartí, J., Torra, V.: A survey of graph-modification techniques for privacy-preserving on networks. Artif. Intell. Rev. 47(3), 341366 (2017)

    Article  Google Scholar 

  11. Chester, S., Kapron, B., Srivastava, G., Venkatesh, S.: Complexity of social network anonymization. Soc. Netw. Anal. Min. 3(2), 151–166 (2012)

    Article  Google Scholar 

  12. Chester, S., Kapron, B.M., Ramesh, G., Srivastava, G., Thomo, A., Venkatesh, S.: Why Waldo befriended the dummy? k-anonymization of social networks with pseudo-nodes. Soc. Netw. Anal. Min. 3(3), 381–399 (2013)

    Article  Google Scholar 

  13. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 1–6 (2004)

    Article  Google Scholar 

  14. Erdös, P., Rényi, A.: On random graphs I. Publ. Math. 6, 290–297 (1959)

    MathSciNet  MATH  Google Scholar 

  15. Hay, M., Miklau, G., Jensen, D., Towsley, D., Weis, P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endow. 1(1), 102–114 (2008)

    Article  Google Scholar 

  16. Hay, M., Liu, K., Miklau, G., Pei, J., Terzi, E.: Privacy-aware data management in information networks. In: International Conference on Management of Data (SIGMOD), pp. 1201–1204. ACM Press, New York, USA (2011)

  17. He, J., Chu, W.W.: A social network-based recommender system (SNRS). Ann. Inf. Syst. 12, 47–74 (2010)

    Article  Google Scholar 

  18. Juan, A.A., Fauln, J., Ferrer, A., Lourenço, H.R., Barrios, B.: MIRHA: multi-start biased randomization of heuristics with adaptive local search for solving non-smooth routing problems. TOP 21(1), 109–132 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  19. Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml

  20. Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: ACM SIGMOD International Conference on Management of Data, pp. 93–106. ACM,NY, USA (2008)

  21. Lu, X., Song, Y., Bressan, S.: Fast identity anonymization on graphs. In: 23rd International Conference on Database and Expert Systems Applications, pp. 281–295. Springer, Berlin (2012)

  22. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. In: 22nd IEEE International Conference on Data Engineering (2006)

  23. Nettleton, D.F., Salas, J.: A data driven anonymization system for information rich online social network graphs. Expert Syst. Appl. 55, 87–105 (2016)

    Article  Google Scholar 

  24. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 1–16 (2003)

    Google Scholar 

  25. Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. PNAS 105(4), 1118–1123 (2008)

    Article  Google Scholar 

  26. Salas, J., Torra, V.: Graphic sequences, distances and k-degree anonymity. Disc. Appl. Math. 188, 2531 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  27. Salas, J., Torra, V.: Improving the characterization of P-stability for applications in network privacy. Disc. Appl. Math. 206, 109114 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  28. Salas, J.: Sampling and merging for graph anonymization. In: Torra, V., Narukawa, Y., Navarro-Arribas, G., Yañez, C. (eds.) Modeling Decisions for Artificial Intelligence, pp. 250–260. Springer, Andorra (2016)

    Chapter  Google Scholar 

  29. Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  30. Sihag, V.K.: A clustering approach for structural \(k\)-anonymity in social networks using genetic algorithm. In: CUBE International Information Technology Conference, pp. 701–706. ACM, Pune, India (2012)

  31. Singh, L., Schramm, C.: Identifying similar neighborhood structures in private social networks. In: International Conference on Data Mining Workshops, pp. 507–516. IEEE, Sydney, NSW (2010)

  32. Skarkala, M.E., Maragoudakis, M., Gritzalis, S., Mitrou, L., Toivonen, H., Moen, P.: Privacy preservation by k-anonymization of weighted social networks. In: Proceedings of ASONAM. pp. 423–428. IEEE Computer Society (2012)

  33. Stokes, K., Torra, V.: On some clustering approaches for graphs. In: 2011 IEEE International Conference on Fuzzy Systems, pp. 409–415 (2011)

  34. Stokes k, K., Torra, V.: Reidentification and k-anonymity: a model for disclosure risk in graphs. Soft Comput. 16(10), 16571670 (2012)

    Article  MATH  Google Scholar 

  35. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  36. Tassa, T., Cohen, D.J.: Anonymization of centralized and distributed social networks by sequential clustering. IEEE Trans. Knowl. Data Eng. 25(2), 311–324 (2013)

    Article  Google Scholar 

  37. Wu, X., Ying, X., Liu, K., Chen, L.: Managing and Mining Graph Data, Chapter A Survey of Privacy-Preservation of Graphs and Social Networks, pp. 421–453. Springer, Boston, MA (2010)

  38. Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)

    Article  Google Scholar 

  39. Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: ICDE (2008)

  40. Zhou, B., Pei, J., Luk, W.S.: A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explor. Newsl. 10(2), 1222 (2008)

    Article  Google Scholar 

Download references

Acknowledgements

Jordi Casas-Roma was partially supported by the Spanish MCYT and the FEDER funds under Grants TIN2011-27076-C03 “CO-PRIVACY” and TIN2014-57364-C2-2-R “SMARTGLACIS.” Julián Salas acknowledges the support of a UOC postdoctoral fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jordi Casas-Roma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ros-Martín, M., Salas, J. & Casas-Roma, J. Scalable non-deterministic clustering-based k-anonymization for rich networks. Int. J. Inf. Secur. 18, 219–238 (2019). https://doi.org/10.1007/s10207-018-0409-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-018-0409-1

Keywords

Navigation