Log in

DiGTreeS: a distributed resilient framework for generalized tree search

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Exact combinatorial search algorithms have applications in several areas of computational algebra, AI, discrete optimization, etc. These problems are compute-intensive and have a highly irregular search tree. Most of the earlier efforts to parallelize these algorithms used a fixed degree of parallelism during runtime. We show that such an approach leads to poor resource utilization as the parallel run-time efficiency of an irregular search application varies over time. We propose DiGTreeS, a distributed resilient framework for generalized tree search that supports elastic scaling. It features an easy-to-use API for expressing combinatorial search and hides away the system concerns such as load balancing, fault tolerance, and elastic scaling. We evaluate the DiGTreeS framework for different scaling strategies and show its effectiveness on four representative problem instances: Traveling Salesman Problem, 0–1 Knapsack, N-queens, and Generic State Space Search Application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Fig. 4
Algorithm 3
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Data availability

Data and/or code will be made available on request.

Notes

  1. It is a technique where the elasticity controller reacts to the change in the system and makes decisions about scaling operations [6].

  2. A task refers to an unexplored portion of the subtree.

  3. ThreadMXBean returns the user-level CPU time for the current thread if CPU time measurement is enabled; \(-1\) otherwise.

  4. https://zookeeper.apache.org/.

  5. http://kafka.apache.org/.

  6. https://hadoop.apache.org/.

  7. http://x10-lang.org/

  8. CPLEX is a commercial ILP solver by IBM.

  9. Hybrid scaling is the combination of both upscaling and downscaling.

  10. We calculate the percentage deviation as the ratio of difference of current value and mean value to the mean value, i.e., \(\text {deviation} = \frac{|\text {current\,value} - \text {mean\,value}|}{\text {mean\,value}} \times 100.\)

  11. http://dimacs.rutgers.edu/archive/Challenges/TSP/.

  12. Execution starts with 10 workers as workers may fail at the very beginning of the execution when the number of workers is small (i.e., less than 4).

  13. It is the technique to anticipate future changes in the system and act accordingly before it occurs [6].

References

  1. Paschos VT (2014) Applications of combinatorial optimization. Wiley, Hoboken

    Book  Google Scholar 

  2. Archibald B, Maier P, Stewart R, Trinder P (2019) Implementing yewpar: a framework for parallel tree search. In: Euro-Par 2019: Parallel Processing: 25th International Conference on Parallel and Distributed Computing, Göttingen, Germany, August 26–30, 2019, Proceedings 25. Springer, pp 184–196

  3. Goldreich O (2010) P, NP, and NP-completeness: the basics of computational complexity. Cambridge University Press, Cambridge

    Book  Google Scholar 

  4. Kehrer S, Blochinger W (2020) Equilibrium: an elasticity controller for parallel tree search in the cloud. J Supercomput 76:9211–9245

    Article  Google Scholar 

  5. Yasugi M, Muraoka D, Hiraishi T, Umatani S, Emoto K (2019) Hope: a parallel execution model based on hierarchical omission. In: Proceedings of the 48th International Conference on Parallel Processing, pp 1–11

  6. Rampérez V, Soriano J, Lizcano D, Lara JA (2021) Flas: a combination of proactive and reactive auto-scaling architecture for distributed services. Futur Gener Comput Syst 118:56–72

    Article  Google Scholar 

  7. Haussmann J, Blochinger W, Kuechlin W (2019) Cost-efficient parallel processing of irregularly structured problems in cloud computing environments. Clust Comput 22(3):887–909

    Article  Google Scholar 

  8. Rosa Righi R, Rodrigues VF, Rostirolla G, Costa CA, Roloff E, Navaux POA (2018) A lightweight plug-and-play elasticity service for self-organizing resource provisioning on parallel applications. Futur Gener Comput Syst 78:176–190

    Article  Google Scholar 

  9. Vizel Y, Weissenbacher G, Malik S (2015) Boolean satisfiability solvers and their applications in model checking. Proc IEEE 103(11):2021–2035

    Article  Google Scholar 

  10. Yang J, He Q (2018) Scheduling parallel computations by work stealing: a survey. Int J Parallel Prog 46:173–197

    Article  Google Scholar 

  11. **e F, Davenport A (2010) Massively parallel constraint programming for supercomputers: Challenges and initial results. In: International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming. Springer, pp 334–338

  12. Herbst NR, Kounev S, Reussner R (2013) Elasticity in cloud computing: what it is, and what it is not. In: 10th International Conference on Autonomic Computing (ICAC 13), pp 23–27

  13. Hunt P, Konar M, Junqueira FP, Reed B (2010) Zookeeper: wait-free coordination for internet-scale systems. In: USENIX Annual Technical Conference, vol 8

  14. Kreps J, Narkhede N, Rao J et al. (2011) Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB, vol 11. Athens, Greece, pp 1–7

  15. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp 1–10. https://doi.org/10.1109/MSST.2010.5496972

  16. Gupta A, Faraboschi P, Gioachin F, Kale LV, Kaufmann R, Lee B-S, March V, Milojicic D, Suen CH (2014) Evaluating and improving the performance and scheduling of hpc applications in cloud. IEEE Trans Cloud Comput 4(3):307–321

    Article  Google Scholar 

  17. Bui P, Rajan D, Abdul-Wahid B, Izaguirre J, Thain D (2011) Work queue+ python: a framework for scalable scientific ensemble applications. In: Workshop on Python for High Performance and Scientific Computing at Sc11

  18. Rosa Righi R, Rodrigues VF, Da Costa CA, Galante G, De Bona LCE, Ferreto T (2015) Autoelastic: automatic resource elasticity for high performance applications in the cloud. IEEE Trans Cloud Comput 4(1):6–19

    Google Scholar 

  19. Archibald B, Maier P, Stewart R, Trinder P, De Beule J (2017) Towards generic scalable parallel combinatorial search. In: Proceedings of the International Workshop on Parallel Symbolic Computation, pp 1–10

  20. Poldner M, Kuchen H (2008) Algorithmic skeletons for branch and bound. In: Software and Data Technologies: First International Conference, ICSOFT 2006, Setúbal, Portugal, September 11–14, 2006, Revised Selected Papers 1. Springer, pp 204–219

  21. Bungart M, Fohry C (2017) A malleable and fault-tolerant task pool framework for x10. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 749–757

  22. Johnson DS, McGeoch LA (1997) The traveling salesman problem: a case study in local optimization. Local Search Comb Optim 1(1):215–310

    Google Scholar 

  23. Salkin HM, De Kluyver CA (1975) The knapsack problem: a survey. Naval Res Logist Q 22(1):127–144

    Article  MathSciNet  Google Scholar 

  24. Bell J, Stevens B (2009) A survey of known results and research areas for n-queens. Discret Math 309(1):1–31

    Article  MathSciNet  Google Scholar 

  25. Khaund A, Sharma AM, Tiwari A, Garg S, Kailasam S (2023) Rd-fca: a resilient distributed framework for formal concept analysis. J Parall Distrib Comput 179:104710

    Article  Google Scholar 

  26. Archibald B, Maier P, McCreesh C, Stewart R, Trinder P (2018) Replicable parallel branch and bound search. J Parall Distrib Comput 113:92–114

    Article  Google Scholar 

  27. Prim RC (1957) Shortest connection networks and some generalizations. Bell Syst Tech J 36(6):1389–1401

    Article  Google Scholar 

  28. Kizilateş G, Nuriyeva F (2013) On the nearest neighbor algorithms for the traveling salesman problem. In: Advances in Computational Science, Engineering and Information Technology: Proceedings of the Third International Conference on Computational Science, Engineering and Information Technology (CCSEIT-2013), KTO Karatay University, June 7–9, 2013, Konya, Turkey-Volume 1. Springer, pp 111–118

  29. Bersani MM, Bianculli D, Dustdar S, Gambi A, Ghezzi C, Krstić S (2014) Towards the formalization of properties of cloud-based elastic systems. In: Proceedings of the 6th International Workshop on Principles of Engineering Service-Oriented and Cloud Systems, pp 38–47

  30. David P (2005) Where are the hard knapsack problems? Comput Oper Res 32(9):2271–2284

    Article  MathSciNet  Google Scholar 

  31. Zangeneh A, Jadid S, Rahimi-Kian A (2010) Normal boundary intersection and benefit-cost ratio for distributed generation planning. Eur Trans Electr Power 20(2):97–113

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

MAJ helped in conceptualization, methodology, writing—original draft, writing—reviewing and editing, software. SK contributed to conceptualization, methodology, writing—reviewing and editing. BG and VS were involved in conceptualization and methodology.

Corresponding author

Correspondence to Md Arshad Jamal.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jamal, M.A., Kailasam, S., Goyal, B. et al. DiGTreeS: a distributed resilient framework for generalized tree search. J Supercomput 80, 15006–15037 (2024). https://doi.org/10.1007/s11227-024-06017-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-024-06017-9

Keywords

Navigation