MaiterStore: A Hot-Aware, High-Performance Key-Value Store for Graph Processing

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8505))

Included in the following conference series:

Abstract

Recently, many cloud-based graph computation frameworks are proposed, such as Pregel, GraphLab and Maiter. Most of them exploit the in-memory storage to obtain fast random access which is required for many graph computation. However, the exponential growth in the scale of large graphs and the limitation of the capacity of main memory pose great challenges to these systems on their scalability.

In this work, we present a high-performance key-value storage system, called MaiterStore, which addresses the scalability challenge by using solid state drives (SSDs). We treat SSDs as an extension of memory and optimize the data structures for fast query of the large graphs on SSDs. Furthermore, observing that hot-spot property and skewed power-law degree distribution are widely existed in real graphs, we propose a hot-aware caching (HAC) policy to effectively manage the hot vertices (frequently accessed vertices). HAC can conduce to the substantial acceleration of the graph iterative execution. We evaluate MaiterStore through extensive experiments on real large graphs and validate the high performance of our system as the graph storage.

This work was partially supported by National Natural Science Foundation of China (61300023, 61272179), Fundamental Research Funds for Central Universities (N120416001, N120816001), China Mobil Labs Fund (MCM20122051), and MOE-Intel Special Fund of Information Technology (MOE-INTEL-2012-06).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 42.79
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 52.74
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Hadoop. http://hadoop.apache.org

  2. Hama. http://hama.apache.org

  3. Amazon EC2. http://aws.amazon.com/ec2/

  4. Web Graph. http://lemurproject.org/clueweb09/

  5. Stanford dataset collection. http://snap.stanford.edu/data

  6. Samsung SSD. http://www.samsung.com/cn/business/business-products/ssd-card

  7. HDFS. http://hadoop.apache.org/core/docs/r0.16.4/hdfsdesign.html

  8. Chen, F., Koufaty, D., Zhang, X.: Hystor: making the best use of solid state drives in high performance storage systems. In: Proceedings of ICS, pp. 22–32 (2011)

    Google Scholar 

  9. Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)

    Google Scholar 

  10. Hu, Y., Jiang, H., Feng, D., Tian, L., Luo, H., Zhang, S.: Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity. In: Proceedings of ICS, pp. 96–107 (2011)

    Google Scholar 

  11. Lee, S.W., Moon, B., Park, C., Kim, J.M., Kim, S.W.: A case for flash memory SSD in enterprise database applications. In: Proceedings of SIGMOD, pp. 1075–1086 (2008)

    Google Scholar 

  12. Andersen, D., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: FAWN: a fast array of wimpy nodes. In: Proceedings of SOSP, pp. 1–14 (2009)

    Google Scholar 

  13. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)

    Article  Google Scholar 

  14. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of OSDI, pp. 205–218 (2006)

    Google Scholar 

  15. Rosenblum, M., Ousterhout, J.K.: The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10(1), 26–51 (1992)

    Article  Google Scholar 

  16. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: a new framework for parallel machine learning. In: Proceedings of UAI, pp. 340–349 (2010)

    Google Scholar 

  17. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD, pp. 135–146 (2010)

    Google Scholar 

  18. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud (2010)

    Google Scholar 

  19. Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: Proceedings of SIGMOD, pp. 505–516 (2013)

    Google Scholar 

  20. Zhang, Y., Gao, Q., Gao, L., Wang, C.: PrIter: a distributed framework for prioritized iterative computations. In: Proceedings of SOCC, pp. 1–14 (2011)

    Google Scholar 

  21. Zhang, Y., Gao, Q., Gao, L., Wang, C.: Maiter: an asynchronous graph processing framework for delta-based accumulative iterative computation. In: IEEE Computer Society (2013)

    Google Scholar 

  22. Anand, A., Muthukrishnan, C., Kappes, S., Akella, A., Nath, S.: Cheap and large CAMs for high performance data-intensive networked systems. In: Proceedings of NSDI, pp. 433–448 (2010)

    Google Scholar 

  23. Debnath, B., Sengupta, S., Li, J.: FlashStore: high throughput persistent key-value store. In: Proceedings of VLDB, pp. 1414–1425 (2010)

    Google Scholar 

  24. Lim, H., Fan, B., Andersen, D.G., Kaminsky, M.: SILT: a memory-efficient, high-performance key-value store. In: Proceedings of SOSP, pp. 1–13 (2011)

    Google Scholar 

  25. Power, R., Li, J.: Piccolo: building fast, distributed programs with partitioned tables. In: Proceedings of OSDI, pp. 1–14 (2010)

    Google Scholar 

  26. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of OSDI, pp. 137–150 (2004)

    Google Scholar 

  27. Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: large-scale graph computation on just a PC. In: Proceedings of OSDI, pp. 31–46 (2012)

    Google Scholar 

  28. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. Comp. Comm. Rev. 29, 251–262 (1999)

    Article  Google Scholar 

  29. Kang, U., Tong, H., Sun, J., Lin, C.Y., Faloutsos, C.: Gbase: a scalable and general graph management system. In: Proceedings of KDD, pp. 1091–1099 (2011)

    Google Scholar 

  30. Chen, S.: Flashlogging: exploiting flash devices for synchronous logging performance. In: Proceedings of SIGMOD, pp. 77–86 (2009)

    Google Scholar 

  31. Leskovec, J., Lang, K., Dasgupta, A., Mahoney, M.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 29–123 (2009)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Chang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chang, D., Zhang, Y., Yu, G. (2014). MaiterStore: A Hot-Aware, High-Performance Key-Value Store for Graph Processing. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43984-5_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43983-8

  • Online ISBN: 978-3-662-43984-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation