Abstract
In this paper, we tackle the problem of graph anonymization in the context of privacy-preserving social network mining. We present a greedy and non-deterministic algorithm to achieve k-anonymity on labeled and undirected networks. Our work aims to create a scalable algorithm for real-world big networks, which runs in parallel and uses biased randomization for improving the quality of the solutions. We propose new metrics that consider the utility of the clusters from a recommender system point of view. We compare our approach to SaNGreeA, a well-known state-of-the-art algorithm for k-anonymity generalization. Finally, we have performed scalability tests, with up to 160 machines within the Hadoop framework, for anonymizing a real-world dataset with around 830 K nodes and 63 M relationships, demonstrating our method’s utility and practical applicability.
Similar content being viewed by others
Notes
The Scala source code is available at: https://bitbucket.org/jcasasr/scan-algorithm/.
Available at: http://igraph.org.
References
Adamic, L.A., Glance, N.: The political blogosphere and the 2004 U.S. election. In: LinkKDD’05, pp. 36–43. ACM (2005)
Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 10, 2008 (2008)
Byun, J.W., Kamra, A., Bertino, E., Li, N.: Efficient \(k\)-anonymization using clustering techniques. In: International Conference on Database Systems for Advanced Applications (DASFAA), pp. 188–200 (2007)
Cai, B.J., Wang, H.Y., Zheng, H.R., Wang, H.: Evaluation repeated random walks in community detection of social networks. In: ICMLC’10, pp. 1849–1854. IEEE (2010)
Campan, A., Truta, T.M.: Data and structural k-anonymity in social networks. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds.) Privacy, Security, and Trust in KDD, pp. 33–54. Springer, Berlin (2009)
Campan, A., Alufaisan, Y., Truta, T.M.: Preserving communities in anonymized social networks. Trans. Data Privacy 8(1), 55–87 (2015)
Casas-Roma, J., Herrera-Joancomartí, J., Torra, V.: Anonymizing graphs: measuring quality for clustering. Knowl. Inf. Syst. 44(3), 507–528 (2014)
Casas-Roma, J., Rousseau, F.: Community-preserving generalization of social networks. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1465–1472. ACM, Paris, France (2015)
Casas-Roma, J., Herrera-Joancomartí, J., Torra, V.: A survey of graph-modification techniques for privacy-preserving on networks. Artif. Intell. Rev. 47(3), 341366 (2017)
Chester, S., Kapron, B., Srivastava, G., Venkatesh, S.: Complexity of social network anonymization. Soc. Netw. Anal. Min. 3(2), 151–166 (2012)
Chester, S., Kapron, B.M., Ramesh, G., Srivastava, G., Thomo, A., Venkatesh, S.: Why Waldo befriended the dummy? k-anonymization of social networks with pseudo-nodes. Soc. Netw. Anal. Min. 3(3), 381–399 (2013)
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 1–6 (2004)
Erdös, P., Rényi, A.: On random graphs I. Publ. Math. 6, 290–297 (1959)
Hay, M., Miklau, G., Jensen, D., Towsley, D., Weis, P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endow. 1(1), 102–114 (2008)
Hay, M., Liu, K., Miklau, G., Pei, J., Terzi, E.: Privacy-aware data management in information networks. In: International Conference on Management of Data (SIGMOD), pp. 1201–1204. ACM Press, New York, USA (2011)
He, J., Chu, W.W.: A social network-based recommender system (SNRS). Ann. Inf. Syst. 12, 47–74 (2010)
Juan, A.A., Fauln, J., Ferrer, A., Lourenço, H.R., Barrios, B.: MIRHA: multi-start biased randomization of heuristics with adaptive local search for solving non-smooth routing problems. TOP 21(1), 109–132 (2011)
Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml
Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: ACM SIGMOD International Conference on Management of Data, pp. 93–106. ACM,NY, USA (2008)
Lu, X., Song, Y., Bressan, S.: Fast identity anonymization on graphs. In: 23rd International Conference on Database and Expert Systems Applications, pp. 281–295. Springer, Berlin (2012)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. In: 22nd IEEE International Conference on Data Engineering (2006)
Nettleton, D.F., Salas, J.: A data driven anonymization system for information rich online social network graphs. Expert Syst. Appl. 55, 87–105 (2016)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 1–16 (2003)
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. PNAS 105(4), 1118–1123 (2008)
Salas, J., Torra, V.: Graphic sequences, distances and k-degree anonymity. Disc. Appl. Math. 188, 2531 (2015)
Salas, J., Torra, V.: Improving the characterization of P-stability for applications in network privacy. Disc. Appl. Math. 206, 109114 (2016)
Salas, J.: Sampling and merging for graph anonymization. In: Torra, V., Narukawa, Y., Navarro-Arribas, G., Yañez, C. (eds.) Modeling Decisions for Artificial Intelligence, pp. 250–260. Springer, Andorra (2016)
Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Sihag, V.K.: A clustering approach for structural \(k\)-anonymity in social networks using genetic algorithm. In: CUBE International Information Technology Conference, pp. 701–706. ACM, Pune, India (2012)
Singh, L., Schramm, C.: Identifying similar neighborhood structures in private social networks. In: International Conference on Data Mining Workshops, pp. 507–516. IEEE, Sydney, NSW (2010)
Skarkala, M.E., Maragoudakis, M., Gritzalis, S., Mitrou, L., Toivonen, H., Moen, P.: Privacy preservation by k-anonymization of weighted social networks. In: Proceedings of ASONAM. pp. 423–428. IEEE Computer Society (2012)
Stokes, K., Torra, V.: On some clustering approaches for graphs. In: 2011 IEEE International Conference on Fuzzy Systems, pp. 409–415 (2011)
Stokes k, K., Torra, V.: Reidentification and k-anonymity: a model for disclosure risk in graphs. Soft Comput. 16(10), 16571670 (2012)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Tassa, T., Cohen, D.J.: Anonymization of centralized and distributed social networks by sequential clustering. IEEE Trans. Knowl. Data Eng. 25(2), 311–324 (2013)
Wu, X., Ying, X., Liu, K., Chen, L.: Managing and Mining Graph Data, Chapter A Survey of Privacy-Preservation of Graphs and Social Networks, pp. 421–453. Springer, Boston, MA (2010)
Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)
Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: ICDE (2008)
Zhou, B., Pei, J., Luk, W.S.: A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explor. Newsl. 10(2), 1222 (2008)
Acknowledgements
Jordi Casas-Roma was partially supported by the Spanish MCYT and the FEDER funds under Grants TIN2011-27076-C03 “CO-PRIVACY” and TIN2014-57364-C2-2-R “SMARTGLACIS.” Julián Salas acknowledges the support of a UOC postdoctoral fellowship.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ros-Martín, M., Salas, J. & Casas-Roma, J. Scalable non-deterministic clustering-based k-anonymization for rich networks. Int. J. Inf. Secur. 18, 219–238 (2019). https://doi.org/10.1007/s10207-018-0409-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10207-018-0409-1