Abstract
Graph sampling allows mining a small representative subgraph from a big graph. Sampling algorithms deploy different strategies to replicate the properties of a given graph in the sampled graph. In this study, we provide a comprehensive empirical characterization of five graph sampling algorithms on six properties of a graph including degree, clustering coefficient, path length, global clustering coefficient, assortativity, and modularity. We extract samples from fifteen graphs grouped into five categories including collaboration, social, citation, technological, and synthetic graphs. We provide both qualitative and quantitative results. We find that there is no single method that extracts true samples from a given graph with respect to the properties tested in this work. Our results show that the sampling algorithm that aggressively explores the neighborhood of a sampled node performs better than the others.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13278-023-01060-5/MediaObjects/13278_2023_1060_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13278-023-01060-5/MediaObjects/13278_2023_1060_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13278-023-01060-5/MediaObjects/13278_2023_1060_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13278-023-01060-5/MediaObjects/13278_2023_1060_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13278-023-01060-5/MediaObjects/13278_2023_1060_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13278-023-01060-5/MediaObjects/13278_2023_1060_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13278-023-01060-5/MediaObjects/13278_2023_1060_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13278-023-01060-5/MediaObjects/13278_2023_1060_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13278-023-01060-5/MediaObjects/13278_2023_1060_Fig9_HTML.png)
Similar content being viewed by others
Data availability
The datsets used in the experiments are available publicly.
Code availability
The source code is available on request.
References
Bar-Yossef Z, Gurevich M (2008) Random sampling from a search engine’s index. J ACM 55(5):24–12474
Becchetti L, Castillo C, Donato D, Fazzone A (2006) A comparison of sampling techniques for web graph characterization. In: LinkKDD
Benevenuto F, Rodrigues T, Cha M, Almeida V (2009) Characterizing user behavior in online social networks. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, pp 49–62
Doerr C, Blenn N (2013) Metric convergence in social network sampling. In ACM Hotplanet
Gjoka M, Kurant M, Butts C, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNS. INFOCOM
Gkantsidis C, Mihail M, Saberi A (2006) Random walks in peer-to-peer networks: algorithms and evaluation. Perform Eval 63(3):241–263
Hu P, Lau W.C (2013) A survey and taxonomy of graph sampling. CoRR abs/1308.5865 ar**v: 1308.5865
Konect: Network dataset—KONECT. http://konect.uni-koblenz.de/networks (2015)
Kwak H, Lee C, Park H, Moon S(2010) What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp 591–600
Lee S, Kim P, Jeong H (2006) Statistical Properties of Sampled Networks. Phys Rev E 73:016102
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1)
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data
Liu L, Wang L, Wu W, Jia H, Zhang Y (2019) A novel hybrid-jump-based sampling method for complex social networks. IEEE Trans Comput Soc Syst 6(2):241–249
Maiya A.S, Berger-Wolf T.Y (2010) Sampling community structure. In: Proceedings of the 19th International Conference on World Wide Web. WWW ’10, pp 701–710
Newman MEJ (2002) Assortative mixing in networks. Phys Rev Lett 89:208701
Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103:8577–8582
Noldus R, Van Mieghem P (2015) Assortativity in complex networks. J Complex Netw 3(4):507–542
Rasti AH, Torkjazi M, Rejaie R, Duffield NG, Willinger W, Stutzbach D (2009) Respondent-driven sampling for characterizing unstructured overlays. In: INFOCOM 2009. 28th IEEE International Conference on Computer Communications, 19-25 April 2009, Rio de Janeiro, Brazil, pp 2701–2705
Ribeeiro B, Towsley D (2010) Estimating and Sampling Graphs with Multidimensional Random Walks. In ACM Internet Measurement Conference
Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, pp 390–403
Rossi RA, Ahmed NK (2015) The network data repository with interactive graph analytics and visualization. http://networkrepository.com
Stutzbach D, Rejaie R, Duffield N, Sen S, Willinger W (2009) On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans Netw 17(2):377–390
Voudigari E, Salamanos N, Papageorgiou T, Yannakoudakis E.J (2016) Rank degree: an efficient algorithm for graph sampling. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 120–129
Wang T, Chen Y, Zhang Z, Xu T, ** L, Hui P, Deng B, Li X (2011) Understanding graph sampling algorithms for social network analysis. In: Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops. ICDCSW ’11, pp 123–128
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442
Wilson C, Boe B, Sala A, Puttaswamy K.P.N, Zhao B.Y (2009) User interactions in social networks and their implications. In: Proceedings of the 4th ACM European Conference on Computer Systems, pp 205–218
Yousuf MI, Kim S (2018) List sampling for large graphs. Intell Data Anal 22:261–295
Yousuf MI, Kim S (2020) Generating graphs by creating associative and random links between existing nodes. J Stat Phys 179:1–32
Yousuf MI, Kim S (2020) Guided sampling for large graphs. Data Min Knowl Discov 34(4):905–948
Funding
There is no funding source to declare.
Author information
Authors and Affiliations
Contributions
MIY and RA proposed the main idea, implemented sampling algorithms and performed experiments. MIY and IA collected data, plot different graphs and tables. All the authors wrote their part. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
There are not competing interests to declare.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yousuf, M.I., Anwer, I. & Anwar, R. Empirical characterization of graph sampling algorithms. Soc. Netw. Anal. Min. 13, 66 (2023). https://doi.org/10.1007/s13278-023-01060-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-023-01060-5