Abstract
The emergence of new hybrid and heterogenous multi-GPUs multi-CPUs large scale platforms offers new opportunities and poses new challenges when solving difficult optimization problems. This paper targets irregular tree search algorithms in which workload is unpredictable. We propose an adaptive distributed approach allowing to distribute the load dynamically at runtime while taking into account the computing abilities of either GPUs or CPUs. Using Branch-and-Bound and FlowShop as a case study, we deployed our approach using up to \(20\) GPUs and \(128\) CPUs. Through extensive experiments in different system configurations, we report near optimal speedups, thus providing new insights into how to take full advantage of both GPUs and CPUs power in modern computing platforms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46, 720–748 (1999)
Boukedjar, A., Lalami, M.E., El-Baz, D.: Parallel branch and bound on a CPU-GPU system. In: 20th International Conference on Parallel, Distributed and Network-Based Processing, pp. 392–398 (2012)
Carneiro, T., Muritiba, A.E., Negreiros, M., De Campos, L., Augusto, G.: A new parallel schema for branch-and-bound algorithms using GPGPU. In: 23rd Symposium on Computer Architecture and High Performance Computing, pp. 41–47 (2011)
Chakroun, I., Melab, M.: An adaptative multi-GPU based branch-and-bound. a case study: the flow-shop scheduling problem. In: 14th IEEE Interernational Conference on High Performance Computing and Communications (2012)
Dijkstra, E.W.: Derivation of a termination detection algorithm for distributed computations. In: Broy, M. (ed.) Control Flow and Data Flow: Concepts of Distributed Programming, pp. 507–512. Springer, Berlin (1987)
Dinan, J., Olivier, S., Sabin, G., Prins, J., Sadayappan, P., Tseng, C.-W.: A message passing benchmark for unbalanced applications. Simul. Model. Pract. Theor. 16(9), 1177–1189 (2008)
Matteo, F., Charles, E.L., Keith, H.R.: The implementation of the cilk-5 multithreaded language. SIGPLAN Not. 33, 212–223 (1998)
Grid500 French national gird. https://www.grid5000.fr/
James, D., Brian, L.D., Sadayappan, P., Krishnamoorthy, S., Jarek, N.: Scalable work stealing. In: Proceedings of ACM Conference on High Performance Computing Networking, Storage and Analysis, pp. 53:1–53:11 (2009)
Lalami, M.E., El-Baz, D.: GPU implementation of the branch and bound method for knapsack problems. In: IPDPS Workshops, pp. 1769–1777 (2012)
Melab, N., Chakroun, I., Mezmaz, M., Tuyttens, D.: A GPU-accelerated b &b algorithm for the flow-shop scheduling problem. In: 14th IEEE Conference on Cluster Computing (2012)
Min, S.-J., Iancu, C., Yelick, K.: Hierarchical work stealing on manycore clusters. In: Proceedings of 5th Conference on Partitioned Global Address Space Programming Models (2011)
Saraswat, V.A., Kambadur, P., Kodali, S., Grove, D., Krishnamoorthy, S.: Lifeline-based global load balancing. In: 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP ’11), pp. 201–212 (2011)
Taillard, E.: Benchmarks for basic scheduling problems. Eur. J. Oper. Res. 64(2), 278–285 (1993)
Acknowledgments
This material is based on work supported by INRIA HEMERA project. Experiments presented in this paper were carried out using the Grid5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr). Thanks also to Imen Chakroun for her precious contributions to the code development of the GPU kernel.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vu, TT., Derbel, B., Melab, N. (2013). Adaptive Dynamic Load Balancing in Heterogeneous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search. In: Nicosia, G., Pardalos, P. (eds) Learning and Intelligent Optimization. LION 2013. Lecture Notes in Computer Science(), vol 7997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44973-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-44973-4_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-44972-7
Online ISBN: 978-3-642-44973-4
eBook Packages: Computer ScienceComputer Science (R0)