Abstract
Modern HPC systems and the associated middleware (such as MPI and parallel file systems) have been exploiting the advances in HPC technologies (multi-/many-core architecture, RDMA-enabled networking, and SSD) for many years. However, Big Data processing and management middleware have not fully taken advantage of such technologies. These disparities are taking HPC and Big Data processing into divergent trajectories. This chapter provides an overview of popular Big Data processing middleware, high-performance interconnects and storage architectures, and discusses the challenges in accelerating Big Data processing middleware by leveraging emerging technologies on modern HPC clusters. This chapter presents case studies of advanced designs based on RDMA and heterogeneous storage architecture, that were proposed to address these challenges for multiple components of Hadoop (HDFS and MapReduce) and Spark. The advanced designs presented in the case studies are publicly available as a part of the High-Performance Big Data (HiBD) project. An overview of the HiBD project is also provided in this chapter. All these works aim to bring HPC and Big Data processing into a convergent trajectory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Apache Software Foundation (2016) Apache Hadoop. http://hadoop.apache.org/
Apache Software Foundation (2016) Apache Spark. http://spark.apache.org/
M. Armbrust, R.S. **n, C. Lian, Y. Huai, D. Liu, J.K. Bradley, X. Meng, T. Kaftan, M.J. Franklin, A. Ghodsi, M. Zaharia, Spark SQL: relational data processing in spark, in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD ’15 (ACM, New York, 2015), pp. 1383–1394
F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, R.E. Gruber, Bigtable: a distributed storage system for structured data, in 7th Conference on Usenix Symposium on Operating Systems Design and Implementation, vol. 7 (2006), pp. 205–218
Cluster File System Inc., Lustre: scalable clustered object storage (2016), http://www.lustre.org/
J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters, in OSDI’04: Proceedings of the 6th conference on Symposium on Operating Systems Design and Implementation (USENIX Association, Berkeley, CA, 2004)
Digital Universe Invaded By Sensors (2014), http://www.emc.com/about/news/press/2014/20140409-01.htm
X. Ding, S. Jiang, F. Chen, K. Davis, X. Zhang, DiskSeen: exploiting disk layout and access history to enhance I/O prefetch, in 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference, ATC’07 (USENIX Association, Berkeley, CA, 2007), pp. 20:1–20:14
C. Engelmann, H. Ong, S.L. Scott, Middleware in modern high performance computing system architectures, in Proceedings of ICCS, Bei**g (2007)
General-Purpose Computation on Graphics Processing Units (GPGPU) (2016), http://gpgpu.org
C. Gniady, Y. Hu, Y.H. Lu, Program counter based techniques for dynamic power management, in IEEE Proceedings-Software (2004), pp. 24–35
D. Goldenberg, M. Kagan, R. Ravid, M.S. Tsirkin, Transparently achieving superior socket performance using zero copy socket direct protocol over 20 Gb/s InfiniBand links, in 2005 IEEE Int’l Conference on Cluster Computing (Cluster) (2005), pp. 1–10
J.E. Gonzalez, R.S. **n, A. Dave, D. Crankshaw, M.J. Franklin, I. Stoica, GraphX: Graph processing in a distributed dataflow framework, in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14) (USENIX Association, Broomfield, CO, 2014), pp. 599–613
HiBench Suite: The BigData Micro Benchmark Suite (2016), https://github.com/intel-hadoop/HiBench
High-Performance Big Data (HiBD) (2016), http://hibd.cse.ohio-state.edu
IDC Digital Universe Study (2011), http://www.emc.com/leadership/programs/digital-universe.htm
Infiniband Trade Association (2016), http://www.infinibandta.org
Intel Many Integrated Core Architecture (2016), http://www.intel.com/technology/architecture-silicon/mic/index.htm
International Data Corporation (IDC): New IDC Worldwide HPC end-user study identifies latest trends in high performance computing usage and spending (2013), http://www.idc.com/getdoc.jsp?containerId=prUS24409313
IP over InfiniBand Working Group (2016), http://www.ietf.org/html.charters/ipoib-charter.html
N.S. Islam, X. Lu, M.W. Rahman, J. Jose, D.K. Panda, A micro-benchmark suite for evaluating HDFS operations on modern clusters, in Proceedings of the 2nd Workshop on Big Data Benchmarking, WBDB (2012)
N.S. Islam, M.W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, D.K. Panda, High performance RDMA-based design of HDFS over InfiniBand, in The International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2012)
N.S. Islam, X. Lu, M.W. Rahman, D.K. Panda, Can parallel replication benefit Hadoop distributed file system for high performance interconnects?, in The Proceedings of IEEE 21st Annual Symposium on High-Performance Interconnects (HOTI), San Jose, CA (2013)
N. Islam, X. Lu, M. Wasi-ur Rahman, R. Rajachandrasekar, D. Panda, In-memory I/O and replication for HDFS with memcached: early experiences, in IEEE International Conference on Big Data (Big Data’14) (2014), pp. 213–218
N.S. Islam, X. Lu, M.Wu. Rahman, D.K.D. Panda, SOR-HDFS: a SEDA-based approach to maximize overlap** in RDMA-enhanced HDFS, in Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC ’14 (ACM, New York, 2014), pp. 261–264
N.S. Islam, X. Lu, M.W. Rahman, D. Shankar, D.K. Panda, Triple-H: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture, in 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2015)
N.S. Islam, D. Shankar, X. Lu, M. Wasi-Ur-Rahman, D.K. Panda, Accelerating I/O performance of big data analytics on HPC clusters through RDMA-based key-value store, in 44th International Conference on Parallel Processing (ICPP) (2015), pp. 280–289
M. Itoh, T. Ishizaki, M. Kishimoto, Accelerated socket communications in system area networks, in IEEE International Conference on Cluster Computing, Cluster ’00 (IEEE Computer Society, Los Alamitos, CA, 2000), p. 357
S. Krishnan, M. Tatineni, C. Baru, myHadoop - Hadoop-on-Demand on traditional HPC resources. Tech. rep., Chapter in ‘Contemporary HPC Architectures’ [KV04] Vassiliki Koutsonikola and Athena Vakali. Ldap: Framework, Practices, and Trends (2004)
S.W. Lee, B. Moon, C. Park, Advances in flash memory SSD technology for enterprise database applications, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09 (ACM, New York, 2009), pp. 863–870
libmemcached: Open Source C/C++ Client Library and Tools for Memcached (2016), http://libmemcached.org/
X. Lu, N.S. Islam, M.W. Rahman, J. Jose, H. Subramoni, H. Wang, D.K. Panda, High-performance design of Hadoop RPC with RDMA over InfiniBand, in The Proceedings of IEEE 42nd International Conference on Parallel Processing (ICPP) (2013)
X. Lu, M. Wasi-ur Rahman, N. Islam, D. Panda, A micro-benchmark suite for evaluating Hadoop RPC on high-performance networks, in Advancing Big Data Benchmarks. Lecture Notes in Computer Science, vol. 8585 (Springer, Heidelberg, 2014), pp. 32–42
X. Lu, M. Rahman, N. Islam, D. Shankar, D. Panda, Accelerating spark with RDMA for big data processing: early experiences, in 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects (HOTI) (2014), pp. 9–16
Memcached: High-Performance, Distributed Memory Object Caching System (2016), http://memcached.org/
MVAPICH: MPI over InfiniBand, 10GigE/iWARP and RoCE. Network Based Computing Lab, The Ohio State University (2016), http://mvapich.cse.ohio-state.edu/
NVIDIA: GPUDirect RDMA (2016), http://docs.nvidia.com/cuda/gpudirect-rdma
NVM Express (NVMe) (2016), http://www.enterprisetech.com/2014/08/06/flashtec-nvram-15-million-iops-sub-microsecond-latency/
Parallel Virtual File System (2016), http://www.pvfs.org
M.W. Rahman, N.S. Islam, X. Lu, J. Jose, H. Subramoni, H. Wang, D.K. Panda, High-performance RDMA-based design of Hadoop MapReduce over InfiniBand, in The Proceedings of International Workshop on High Performance Data Intensive Computing (HPDIC), in conjunction with IEEE International Parallel and Distributed Processing Symposium (IPDPS), Boston (2013)
M.W. Rahman, X. Lu, N.S. Islam, D.K. Panda, HOMR: a hybrid approach to exploit maximum overlap** in MapReduce over high performance interconnects, in International Conference on Supercomputing (ICS), Munich (2014)
M.W. Rahman, X. Lu, N.S. Islam, R. Rajachandrasekar, D.K. Panda, MapReduce over lustre: can RDMA-based approach benefit?, in 20th International European Conference on Parallel Processing, Porto, Euro-Par (2014)
M.W. Rahman, X. Lu, N.S. Islam, R. Rajachandrasekar, D.K. Panda, High-performance design of YARN MapReduce on modern HPC clusters with lustre and RDMA, in 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2015)
RDMA Consortium: Architectural Specifications for RDMA over TCP/IP (2016), http://www.rdmaconsortium.org/
SDSC Comet (2016), http://www.sdsc.edu/services/hpc/hpc_systems.html
D. Shankar, X. Lu, M.W. Rahman, N. Islam, D.K. Panda, A micro-benchmark suite for evaluating Hadoop MapReduce on high-performance networks, in Proceedings of the Fifth workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware, BPOE-5, vol. 8807 (Springer International Publishing, Hangzhou, 2014), pp. 19–33
K. Shvachko, H. Kuang, S. Radia, R. Chansler, The Hadoop distributed file system, in The Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), Washington, DC (2010)
T.L. Sterling, J. Salmon, D.J. Becker, D.F. Savarese, How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters (MIT Press, Cambridge, MA, 1999)
T. Sterling, E. Lusk, W. Gropp, Beowulf Cluster Computing with Linux (MIT Press, Cambridge, MA, 2003)
H. Subramoni, P. Lai, M. Luo, D.K. Panda, RDMA over ethernet - a preliminary study, in Proceedings of the 2009 Workshop on High Performance Interconnects for Distributed Computing (HPIDC’09) (2009)
H. Subramoni, K. Hamidouche, A. Venkatesh, S. Chakraborty, D. Panda, Designing MPI library with dynamic connected transport (DCT) of InfiniBand: early experiences, in Supercomputing, ed. by J. Kunkel, T. Ludwig, H. Meuer. Lecture Notes in Computer Science, vol. 8488 (Springer, Berlin, 2014), pp. 278–295
S. Sur, H. Wang, J. Huang, X. Ouyang, D.K. Panda (2010) Can high performance interconnects benefit Hadoop distributed file system?, in Workshop on Micro Architectural Support for Virtualization, Data Center Computing, and Clouds, in Conjunction with MICRO 2010, Atlanta, GA
TACC Stampede (2016), https://www.tacc.utexas.edu/stampede/
The Netty Project (2016), http://netty.io
Top500 Supercomputing System (2016), http://www.top500.org
V.K. Vavilapalli, A.C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, E. Baldeschwieler, Apache Hadoop YARN: yet another resource negotiator, in Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC ’13 (ACM, New York, 2013), pp 5:1–5:16
Y. Wang, X. Que, W. Yu, D. Goldenberg, D. Sehgal, Hadoop acceleration through network levitated merge, in Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Seattle, WA (2011)
Y. Wang, R. Goldstone, W. Yu, T. Wang, Characterization and optimization of memory-resident MapReduce on HPC systems, in 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2014)
M. Welsh, D. Culler, E. Brewer, SEDA: an architecture for well-conditioned, scalable Internet services, in Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP), Banff, Alberta (2001)
M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark: cluster computing with working sets, in Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud), Boston, MA (2010)
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M.J. Franklin, S. Shenker, I. Stoica, Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. Presented as part of the 9th USENIX symposium on networked systems design and implementation (NSDI 12) (USENIX, San Jose, CA, 2012), pp. 15–28
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, I. Stoica, Discretized streams: fault-tolerant streaming computation at scale, in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13 (ACM, New York, 2013), pp. 423–438
Acknowledgements
This research is supported in part by National Science Foundation grants #CNS-1419123, #IIS-1447804, and #ACI-1450440.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Lu, X., Wasi-ur-Rahman, M., Islam, N., Shankar, D., Panda, D.K.(. (2016). Accelerating Big Data Processing on Modern HPC Clusters. In: Arora, R. (eds) Conquering Big Data with High Performance Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-33742-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-33742-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33740-1
Online ISBN: 978-3-319-33742-5
eBook Packages: Computer ScienceComputer Science (R0)