Abstract
Recently, many cloud-based graph computation frameworks are proposed, such as Pregel, GraphLab and Maiter. Most of them exploit the in-memory storage to obtain fast random access which is required for many graph computation. However, the exponential growth in the scale of large graphs and the limitation of the capacity of main memory pose great challenges to these systems on their scalability.
In this work, we present a high-performance key-value storage system, called MaiterStore, which addresses the scalability challenge by using solid state drives (SSDs). We treat SSDs as an extension of memory and optimize the data structures for fast query of the large graphs on SSDs. Furthermore, observing that hot-spot property and skewed power-law degree distribution are widely existed in real graphs, we propose a hot-aware caching (HAC) policy to effectively manage the hot vertices (frequently accessed vertices). HAC can conduce to the substantial acceleration of the graph iterative execution. We evaluate MaiterStore through extensive experiments on real large graphs and validate the high performance of our system as the graph storage.
This work was partially supported by National Natural Science Foundation of China (61300023, 61272179), Fundamental Research Funds for Central Universities (N120416001, N120816001), China Mobil Labs Fund (MCM20122051), and MOE-Intel Special Fund of Information Technology (MOE-INTEL-2012-06).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hadoop. http://hadoop.apache.org
Hama. http://hama.apache.org
Amazon EC2. http://aws.amazon.com/ec2/
Web Graph. http://lemurproject.org/clueweb09/
Stanford dataset collection. http://snap.stanford.edu/data
Samsung SSD. http://www.samsung.com/cn/business/business-products/ssd-card
HDFS. http://hadoop.apache.org/core/docs/r0.16.4/hdfsdesign.html
Chen, F., Koufaty, D., Zhang, X.: Hystor: making the best use of solid state drives in high performance storage systems. In: Proceedings of ICS, pp. 22–32 (2011)
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)
Hu, Y., Jiang, H., Feng, D., Tian, L., Luo, H., Zhang, S.: Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity. In: Proceedings of ICS, pp. 96–107 (2011)
Lee, S.W., Moon, B., Park, C., Kim, J.M., Kim, S.W.: A case for flash memory SSD in enterprise database applications. In: Proceedings of SIGMOD, pp. 1075–1086 (2008)
Andersen, D., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: FAWN: a fast array of wimpy nodes. In: Proceedings of SOSP, pp. 1–14 (2009)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of OSDI, pp. 205–218 (2006)
Rosenblum, M., Ousterhout, J.K.: The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10(1), 26–51 (1992)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: a new framework for parallel machine learning. In: Proceedings of UAI, pp. 340–349 (2010)
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD, pp. 135–146 (2010)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud (2010)
Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: Proceedings of SIGMOD, pp. 505–516 (2013)
Zhang, Y., Gao, Q., Gao, L., Wang, C.: PrIter: a distributed framework for prioritized iterative computations. In: Proceedings of SOCC, pp. 1–14 (2011)
Zhang, Y., Gao, Q., Gao, L., Wang, C.: Maiter: an asynchronous graph processing framework for delta-based accumulative iterative computation. In: IEEE Computer Society (2013)
Anand, A., Muthukrishnan, C., Kappes, S., Akella, A., Nath, S.: Cheap and large CAMs for high performance data-intensive networked systems. In: Proceedings of NSDI, pp. 433–448 (2010)
Debnath, B., Sengupta, S., Li, J.: FlashStore: high throughput persistent key-value store. In: Proceedings of VLDB, pp. 1414–1425 (2010)
Lim, H., Fan, B., Andersen, D.G., Kaminsky, M.: SILT: a memory-efficient, high-performance key-value store. In: Proceedings of SOSP, pp. 1–13 (2011)
Power, R., Li, J.: Piccolo: building fast, distributed programs with partitioned tables. In: Proceedings of OSDI, pp. 1–14 (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of OSDI, pp. 137–150 (2004)
Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: large-scale graph computation on just a PC. In: Proceedings of OSDI, pp. 31–46 (2012)
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. Comp. Comm. Rev. 29, 251–262 (1999)
Kang, U., Tong, H., Sun, J., Lin, C.Y., Faloutsos, C.: Gbase: a scalable and general graph management system. In: Proceedings of KDD, pp. 1091–1099 (2011)
Chen, S.: Flashlogging: exploiting flash devices for synchronous logging performance. In: Proceedings of SIGMOD, pp. 77–86 (2009)
Leskovec, J., Lang, K., Dasgupta, A., Mahoney, M.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 29–123 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chang, D., Zhang, Y., Yu, G. (2014). MaiterStore: A Hot-Aware, High-Performance Key-Value Store for Graph Processing. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-662-43984-5_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43983-8
Online ISBN: 978-3-662-43984-5
eBook Packages: Computer ScienceComputer Science (R0)