Fast Construction of Compressed Web Graphs

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10508))

Included in the following conference series:

  • 664 Accesses

Abstract

Several compressed graph representations were proposed in the last 15 years. Today, all these representations are highly relevant in practice since they enable to keep large-scale web and social graphs in the main memory of a single machine and consequently facilitate fast random access to nodes and edges.

While much effort was spent on finding space-efficient and fast representations, one issue was only partially addressed: develo** resource-efficient construction algorithms. In this paper, we engineer the construction of regular and hybrid \(k^2\)-trees. We show that algorithms based on the Z-order sorting reduce the memory footprint significantly and at the same time are faster than previous approaches. We also engineer a parallel version, which fully utilizes all CPUs and caches. We show the practicality of the latter version by constructing partitioned hybrid k-trees for Web graphs in the scale of a billion nodes and up to 100 billion edges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We use the notation and precedence and associativity rules of the C programming language for shift left “\(\ll \)”, shift right “\(\gg \)”, bitwise AND “\( \mathrel { \& }\)”, and bitwise OR “\(\mathrel {|}\)”.

  2. 2.

    See e.g. http://www.agner.org/optimize/instruction_tables.pdf.

  3. 3.

    Via a combination of XOR and count leading zeros (clz).

  4. 4.

    The code is available at https://github.com/Jabro/sdsl-lite.

References

  1. Apostolico, A., Drovandi, G.: Graph compression by BFS. Algorithms 2(3), 1031–1044 (2009)

    Article  MathSciNet  Google Scholar 

  2. Bern, M., Eppstein, D., Teng, S.-H.: Parallel construction of quadtrees and quality triangulations. In: Dehne, F., Sack, J.-R., Santoro, N., Whitesides, S. (eds.) WADS 1993. LNCS, vol. 709, pp. 188–199. Springer, Heidelberg (1993). doi:10.1007/3-540-57155-8_247

    Chapter  Google Scholar 

  3. Boldi, P., Codenotti, B., Santini, M., Vigna, S.: UbiCrawler: a scalable fully distributed web crawler. Softw. Pract. Exp. 34(8), 711–726 (2004)

    Article  Google Scholar 

  4. Boldi, P., Marino, A., Santini, M., Vigna, S.: BUbiNG: massive crawling for the masses. In: Proceedings of WWW, pp. 227–228 (2014)

    Google Scholar 

  5. Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of WWW, pp. 595–601 (2004)

    Google Scholar 

  6. Brisaboa, N.R., Ladra, S., Navarro, G.: k2-trees for compact web graph representation. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 18–30. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03784-9_3

    Chapter  Google Scholar 

  7. Brisaboa, N.R., Ladra, S., Navarro, G.: DACs: bringing direct access to variable-length codes. Inf. Process. Manag. 49(1), 392–404 (2013)

    Article  Google Scholar 

  8. Brisaboa, N.R., Ladra, S., Navarro, G.: Compact representation of web graphs with extended functionality. Inf. Syst. 39, 152–174 (2014)

    Article  Google Scholar 

  9. Claude, F., Navarro, G.: Fast and compact web graph representations. ACM Trans. Web 1(1), 77–91 (2009)

    MATH  Google Scholar 

  10. Dementiev, R., Kettner, L., Sanders, P.: STXXL: standard template library for XXL data sets. Softw. Pract. Exper. 38(6), 589–637 (2008)

    Article  Google Scholar 

  11. Hernández, C., Navarro, G.: Compressed representations for web and social graphs. Knowl. Inf. Syst. 40(2), 279–313 (2014)

    Article  Google Scholar 

  12. Jacobson, G.: Space-efficient static trees and graphs. In: Proceedings of FOCS, pp. 549–554 (1989)

    Google Scholar 

  13. Junghanns, M., Petermann, A., Gómez, K., Rahm, E.: GRADOOP: scalable graph data management and analytics with Hadoop. CoRR abs/1506.00548 (2015)

    Google Scholar 

  14. Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. In: Proceedings of USENIX, pp. 31–46 (2012)

    Google Scholar 

  15. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD, pp. 135–146 (2010)

    Google Scholar 

  16. Singler, J., Sanders, P., Putze, F.: MCSTL: the multi-core standard template library. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 682–694. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74466-5_72

    Chapter  Google Scholar 

  17. **n, R.S., Crankshaw, D., Dave, A., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: unifying data-parallel and graph-parallel analytics. CoRR abs/1402.2394 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Broß .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Broß, J., Gog, S., Hauck, M., Paradies, M. (2017). Fast Construction of Compressed Web Graphs. In: Fici, G., Sciortino, M., Venturini, R. (eds) String Processing and Information Retrieval. SPIRE 2017. Lecture Notes in Computer Science(), vol 10508. Springer, Cham. https://doi.org/10.1007/978-3-319-67428-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67428-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67427-8

  • Online ISBN: 978-3-319-67428-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation