Log in

Empirical characterization of graph sampling algorithms

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Graph sampling allows mining a small representative subgraph from a big graph. Sampling algorithms deploy different strategies to replicate the properties of a given graph in the sampled graph. In this study, we provide a comprehensive empirical characterization of five graph sampling algorithms on six properties of a graph including degree, clustering coefficient, path length, global clustering coefficient, assortativity, and modularity. We extract samples from fifteen graphs grouped into five categories including collaboration, social, citation, technological, and synthetic graphs. We provide both qualitative and quantitative results. We find that there is no single method that extracts true samples from a given graph with respect to the properties tested in this work. Our results show that the sampling algorithm that aggressively explores the neighborhood of a sampled node performs better than the others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Spain)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datsets used in the experiments are available publicly.

Code availability

The source code is available on request.

References

  • Bar-Yossef Z, Gurevich M (2008) Random sampling from a search engine’s index. J ACM 55(5):24–12474

    Article  MathSciNet  Google Scholar 

  • Becchetti L, Castillo C, Donato D, Fazzone A (2006) A comparison of sampling techniques for web graph characterization. In: LinkKDD

  • Benevenuto F, Rodrigues T, Cha M, Almeida V (2009) Characterizing user behavior in online social networks. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, pp 49–62

  • Doerr C, Blenn N (2013) Metric convergence in social network sampling. In ACM Hotplanet

  • Gjoka M, Kurant M, Butts C, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNS. INFOCOM

  • Gkantsidis C, Mihail M, Saberi A (2006) Random walks in peer-to-peer networks: algorithms and evaluation. Perform Eval 63(3):241–263

    Article  Google Scholar 

  • Hu P, Lau W.C (2013) A survey and taxonomy of graph sampling. CoRR abs/1308.5865 ar**v: 1308.5865

  • Konect: Network dataset—KONECT. http://konect.uni-koblenz.de/networks (2015)

  • Kwak H, Lee C, Park H, Moon S(2010) What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp 591–600

  • Lee S, Kim P, Jeong H (2006) Statistical Properties of Sampled Networks. Phys Rev E 73:016102

    Article  Google Scholar 

  • Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1)

  • Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data

  • Liu L, Wang L, Wu W, Jia H, Zhang Y (2019) A novel hybrid-jump-based sampling method for complex social networks. IEEE Trans Comput Soc Syst 6(2):241–249

    Article  Google Scholar 

  • Maiya A.S, Berger-Wolf T.Y (2010) Sampling community structure. In: Proceedings of the 19th International Conference on World Wide Web. WWW ’10, pp 701–710

  • Newman MEJ (2002) Assortative mixing in networks. Phys Rev Lett 89:208701

    Article  Google Scholar 

  • Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103:8577–8582

    Article  Google Scholar 

  • Noldus R, Van Mieghem P (2015) Assortativity in complex networks. J Complex Netw 3(4):507–542

    Article  MathSciNet  Google Scholar 

  • Rasti AH, Torkjazi M, Rejaie R, Duffield NG, Willinger W, Stutzbach D (2009) Respondent-driven sampling for characterizing unstructured overlays. In: INFOCOM 2009. 28th IEEE International Conference on Computer Communications, 19-25 April 2009, Rio de Janeiro, Brazil, pp 2701–2705

  • Ribeeiro B, Towsley D (2010) Estimating and Sampling Graphs with Multidimensional Random Walks. In ACM Internet Measurement Conference

  • Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, pp 390–403

  • Rossi RA, Ahmed NK (2015) The network data repository with interactive graph analytics and visualization. http://networkrepository.com

  • Stutzbach D, Rejaie R, Duffield N, Sen S, Willinger W (2009) On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans Netw 17(2):377–390

    Article  Google Scholar 

  • Voudigari E, Salamanos N, Papageorgiou T, Yannakoudakis E.J (2016) Rank degree: an efficient algorithm for graph sampling. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 120–129

  • Wang T, Chen Y, Zhang Z, Xu T, ** L, Hui P, Deng B, Li X (2011) Understanding graph sampling algorithms for social network analysis. In: Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops. ICDCSW ’11, pp 123–128

  • Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442

    Article  Google Scholar 

  • Wilson C, Boe B, Sala A, Puttaswamy K.P.N, Zhao B.Y (2009) User interactions in social networks and their implications. In: Proceedings of the 4th ACM European Conference on Computer Systems, pp 205–218

  • Yousuf MI, Kim S (2018) List sampling for large graphs. Intell Data Anal 22:261–295

    Article  Google Scholar 

  • Yousuf MI, Kim S (2020) Generating graphs by creating associative and random links between existing nodes. J Stat Phys 179:1–32

    Article  MathSciNet  Google Scholar 

  • Yousuf MI, Kim S (2020) Guided sampling for large graphs. Data Min Knowl Discov 34(4):905–948

    Article  MathSciNet  Google Scholar 

Download references

Funding

There is no funding source to declare.

Author information

Authors and Affiliations

Authors

Contributions

MIY and RA proposed the main idea, implemented sampling algorithms and performed experiments. MIY and IA collected data, plot different graphs and tables. All the authors wrote their part. All authors reviewed the manuscript.

Corresponding author

Correspondence to Muhammad Irfan Yousuf.

Ethics declarations

Conflict of interest

There are not competing interests to declare.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yousuf, M.I., Anwer, I. & Anwar, R. Empirical characterization of graph sampling algorithms. Soc. Netw. Anal. Min. 13, 66 (2023). https://doi.org/10.1007/s13278-023-01060-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-023-01060-5

Keywords

Navigation