Log in

Resource allocation and aging priority-based scheduling of linear workflow applications with transient failures and selective imprecise computations

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

A wide range of applications in distributed environments have a linear structure, varying priorities, and may experience transient software failures. As the computational demands of such linear workflow (LW) jobs continue to grow, their efficient, fair, and fault-tolerant resource allocation and scheduling is becoming more challenging. To address this problem, we propose a fair and efficient scheduling approach, which considers that the priorities of the jobs age with time. We jointly use this scheduling strategy with three practical routing techniques, as well as two variants of an application-directed checkpointing scheme. The first variant of this scheme incorporates imprecise computations in a selective manner, whereas the second one does not use imprecise computations at all. Our aim is to dynamically allocate and schedule LW jobs with different priorities and transient software failures in a distributed system. Through extensive experimentation, we evaluate the system performance under the considered routing methods and checkpointing schemes, utilizing various load cases and failure probabilities. The simulation results showcase the impact of selective imprecise computations on the system performance, while providing insights into how the examined routing strategies perform in each of the investigated scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Thailand)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Razaque, A., Jararweh, Y., Alotaibi, B., Alotaibi, M., Almiani, M.: Hybrid energy-efficient algorithm for efficient internet of things deployment. Sustain. Comput. Inf. Syst. 35, 100715 (2022). https://doi.org/10.1016/j.suscom.2022.100715

    Article  Google Scholar 

  2. Chen, Y., De Luca, G.: Service-Oriented Computing and System Integration: Software, IoT, Big Data, and AI as Services, 8th edn. Kendall Hunt Publishing, Dubuque (2022)

    Google Scholar 

  3. Furfaro, A., Felicetti, C., Saccà, D., Crupi, F.: Accountability of IoT Devices. Internet of Things, 1st edn., pp. 133–147. Springer, Cham (2023)

    Google Scholar 

  4. Furfaro, A., Piccolo, A., Parise, A., Argento, L., Saccà, D.: A cloud-based platform for the emulation of complex cybersecurity scenarios. Futur. Gener. Comput. Syst. 89, 791–803 (2018). https://doi.org/10.1016/j.future.2018.07.025

    Article  Google Scholar 

  5. Hamdan, S., Almajali, S., Ayyash, M., Bany Salameh, H., Jararweh, Y.: An intelligent edge-enabled distributed multi-task learning architecture for large-scale IoT-based cyber-physical systems. Simul. Model. Pract. Theor. 122, 102685 (2023). https://doi.org/10.1016/j.simpat.2022.102685

    Article  Google Scholar 

  6. De Luca, G., Chen, Y.: Explainable artificial intelligence for workflow verification in visual IoT/robotics programming language environment. J. Artif. Intell. Technol. 1(1), 21–27 (2020). https://doi.org/10.37965/jait.2020.0023

    Article  Google Scholar 

  7. Makani, S., Pittala, R., Alsayed, E., Aloqaily, M., Jararweh, Y.: A survey of blockchain applications in sustainable and smart cities. Cluster Comput. 25(6), 3915–3936 (2022). https://doi.org/10.1007/s10586-022-03625-z

    Article  Google Scholar 

  8. Wu, Q., Gu, Y.: Performance analysis and optimization of linear workflows in heterogeneous network environments. In: Computer Communications and Networks, 1st edn, pp. 89–120. Springer, London (2011)

  9. Kumar Dehury, C., Jakovits, P., Narayana Srirama, S., Giotis, G., Garg, G.: TOSCAdata: modeling data pipeline applications in TOSCA. J. Syst. Softw. 186, 111164 (2022). https://doi.org/10.1016/j.jss.2021.111164

    Article  Google Scholar 

  10. Stavrinides, G.L., Karatza, H.D.: The impact of data locality on the performance of a SaaS cloud with real-time data-intensive applications. In: Proceedings of the 21st IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT’17), pp. 1–8 (2017). https://doi.org/10.1109/DISTRA.2017.8167683

  11. Li, Y., Xu, N., Lyu, Q.: Construction of a knee osteoarthritis diagnostic system based on X-ray image processing. Cluster Comput. 22(6), 15533–15540 (2019). https://doi.org/10.1007/s10586-018-2677-y

    Article  Google Scholar 

  12. Shi, C., Xu, C., He, J., Chen, Y., Cheng, Y., Yang, Q., Qiu, H.: Graph-based convolution feature aggregation for retinal vessel segmentation. Simul. Model. Pract. Theor. 121, 102653 (2022). https://doi.org/10.1016/j.simpat.2022.102653

    Article  Google Scholar 

  13. Dautov, R., Distefano, S.: Stream processing on clustered edge devices. IEEE Trans. Cloud Comput. 10(2), 885–898 (2022). https://doi.org/10.1109/TCC.2020.2983402

    Article  Google Scholar 

  14. Dohi, T., Zheng, J., Okamura, H., Trivedi, K.S.: Optimal periodic software rejuvenation policies based on interval reliability criteria. Reliab. Eng. Syst. Saf. 180, 463–475 (2018). https://doi.org/10.1016/j.ress.2018.08.009

    Article  Google Scholar 

  15. Lin, K.J., Natarajan, S., Liu, J.W.S.: Imprecise results: utilizing partial computations in real-time systems. In: Proceedings of the 8th IEEE Real-Time Systems Symposium (RTSS’87), pp. 210–217 (1987)

  16. Wu, X., Zhang, K., Jerry: An aggressive non-preemptive real-time scheduling using imprecise computing. In: Proceedings of the 2023 IEEE International Conference on Integrated Circuits and Communication Systems (ICICACS’23), pp. 1–7 (2023). https://doi.org/10.1109/ICICACS57338.2023.10100094

  17. Stavrinides, G.L., Karatza, H.D.: Resource allocation and scheduling of linear workflow applications with ageing priorities and transient failures. In: Proceedings of the 19th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA’22), pp. 1–8 (2022). https://doi.org/10.1109/AICCSA56895.2022.10017928

  18. Najafizadeh, A., Salajegheh, A., Rahmani, A.M., Sahafi, A.: Multi-objective task scheduling in cloud-fog computing using goal programming approach. Cluster Comput. 25(1), 141–165 (2022). https://doi.org/10.1007/s10586-021-03371-8

    Article  Google Scholar 

  19. Zikos, S., Karatza, H.D.: Communication cost effective scheduling policies of nonclairvoyant jobs with load balancing in a grid. J. Syst. Softw. 82(12), 2103–2116 (2009). https://doi.org/10.1016/j.jss.2009.07.006

    Article  Google Scholar 

  20. Ajitha, K.M., Indra, N.C.: Fisher linear discriminant and discrete global swarm based task scheduling in cloud environment. Cluster Comput. 25(5), 3145–3160 (2022). https://doi.org/10.1007/s10586-021-03509-8

    Article  Google Scholar 

  21. Karaoglanoglou, K., Karatza, H.: Resource discovery in a grid system: directing requests to trustworthy virtual organizations based on global trust values. J. Syst. Softw. 84(3), 465–478 (2011). https://doi.org/10.1016/j.jss.2010.10.043

    Article  Google Scholar 

  22. Choudhary, A., Govil, M.C., Singh, G., Awasthi, L.K., Pilli, E.S.: Energy-aware scientific workflow scheduling in cloud environment. Cluster Comput. 25(6), 3845–3874 (2022). https://doi.org/10.1007/s10586-022-03613-3

    Article  Google Scholar 

  23. Papazachos, Z.C., Karatza, H.D.: The impact of task service time variability on gang scheduling performance in a two-cluster system. Simul. Model. Pract. Theor. 17(7), 1276–1289 (2009). https://doi.org/10.1016/j.simpat.2009.05.002

    Article  Google Scholar 

  24. Belgacem, A., Beghdad-Bey, K., Nacer, H., Bouznad, S.: Efficient dynamic resource allocation method for cloud computing environment. Cluster Comput. 23(4), 2871–2889 (2020). https://doi.org/10.1007/s10586-020-03053-x

    Article  Google Scholar 

  25. Stavrinides, G.L., Karatza, H.D.: Orchestrating real-time IoT workflows in a fog computing environment utilizing partial computations with end-to-end error propagation. Cluster Comput. 24(4), 3629–3650 (2021). https://doi.org/10.1007/s10586-021-03327-y

    Article  Google Scholar 

  26. Fan, L., Liu, X., Li, X., Yuan, D., Xu, J.: Graph4Edge: a graph-based computation offloading strategy for mobile-edge workflow applications. In: Proceedings of the 2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops’20), pp. 1–4 (2020). https://doi.org/10.1109/PerComWorkshops48775.2020.9156270

  27. Ait Aba, M., Zaourar, L., Munier, A.: Approximation algorithm for scheduling a chain of tasks on heterogeneous systems. In: Proceedings of the 23rd International European Conference on Parallel and Distributed Computing (Euro-Par’17), Parallel Processing Workshops, pp. 353–365 (2017). https://doi.org/10.1007/978-3-319-75178-8_29

  28. Benoit, A., Nicod, J., Rehn-Sonigo, V.: Optimizing buffer sizes for pipeline workflow scheduling with setup times. In: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW’14), pp. 662–670 (2014). https://doi.org/10.1109/IPDPSW.2014.77

  29. Khojasteh-Toussi, G., Naghibzadeh, M.: A divide and conquer approach to deadline constrained cost-optimization workflow scheduling for the cloud. Cluster Comput. 24(3), 1711–1733 (2021). https://doi.org/10.1007/s10586-020-03223-x

    Article  Google Scholar 

  30. Agrawal, K., Benoit, A., Robert, Y.: Map** linear workflows with computation/communication overlap. In: Proceedings of the 14th IEEE International Conference on Parallel and Distributed Systems (ICPADS’08), pp. 195–202 (2008). https://doi.org/10.1109/ICPADS.2008.107

  31. Agrawal, K., Benoit, A., Magnan, L., Robert, Y.: Scheduling algorithms for linear workflow optimization. In: Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS’10), pp. 1–12 (2010). https://doi.org/10.1109/IPDPS.2010.5470346

  32. Schlatow, J., Ernst, R.: Response-time analysis for task chains in communicating threads. In: Proceedings of the 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’16), pp. 1–10 (2016). https://doi.org/10.1109/RTAS.2016.7461359

  33. Stavrinides, G.L., Karatza, H.D.: Multicriteria scheduling of linear workflows with dynamically varying structure on distributed platforms. Simul. Model. Pract. Theor. 112, 102369 (2021). https://doi.org/10.1016/j.simpat.2021.102369

    Article  Google Scholar 

  34. Siavvas, M., Gelenbe, E.: Optimum checkpoints for programs with loops. Simul. Model. Pract. Theor. 97, 101951 (2019). https://doi.org/10.1016/j.simpat.2019.101951

    Article  Google Scholar 

  35. Benoit, A., Cavelan, A., Robert, Y., Sun, H.: Multi-level checkpointing and silent error detection for linear workflows. J. Comput. Sci. 28, 398–415 (2018). https://doi.org/10.1016/j.jocs.2017.03.024

    Article  MathSciNet  Google Scholar 

  36. Han, L., Canon, L.C., Casanova, H., Robert, Y., Vivien, F.: Checkpointing workflows for fail-stop errors. IEEE Trans. Comput. 67(8), 1105–1120 (2018). https://doi.org/10.1109/TC.2018.2801300

    Article  MathSciNet  Google Scholar 

  37. Feng, W.C., Liu, J.W.S.: Algorithms for scheduling real-time tasks with input error and end-to-end deadlines. IEEE Trans. Softw. Eng. 23(2), 93–106 (1997). https://doi.org/10.1109/32.585499

    Article  Google Scholar 

  38. Esmaili, A., Nazemi, M., Pedram, M.: Energy-aware scheduling of task graphs with imprecise computations and end-to-end deadlines. ACM Trans. Des. Autom. Electron. Syst. 25(1), 11–11121 (2019). https://doi.org/10.1145/3365999

    Article  Google Scholar 

  39. Stavrinides, G.L., Karatza, H.D.: Scheduling linear workflows with dynamically adjustable exit tasks on distributed resources. In: Proceedings of the IEEE 15th International Symposium on Autonomous Decentralized Systems (ISADS’23), pp. 1–8 (2023). https://doi.org/10.1109/ISADS56919.2023.10092151

  40. Yao, S., Hao, Y., Zhao, Y., Shao, H., Liu, D., Liu, S., Wang, T., Li, J., Abdelzaher, T.: Scheduling real-time deep learning services as imprecise computations. In: Proceedings of the IEEE 26th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’20), pp. 1–10 (2020). https://doi.org/10.1109/RTCSA50079.2020.9203676

  41. Stavrinides, G.L., Karatza, H.D.: Fault-tolerant gang scheduling in distributed real-time systems utilizing imprecise computations. Simulation 85(8), 525–536 (2009). https://doi.org/10.1177/0037549709340729

    Article  Google Scholar 

  42. Stavrinides, G.L., Karatza, H.D.: Scheduling real-time parallel applications in SaaS clouds in the presence of transient software failures. In: Proceedings of the 2016 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS’16), pp. 1–8 (2016). https://doi.org/10.1109/SPECTS.2016.7570524

  43. Stavrinides, G.L., Karatza, H.D.: The impact of checkpointing interval selection on the scheduling performance of real-time fine-grained parallel applications in SaaS clouds under various failure probabilities. Concurr. Comp. Pract. Exp. 30(12), 4288 (2018). https://doi.org/10.1002/cpe.4288

    Article  Google Scholar 

  44. Mohammadi, F.D., Heh, D.: Power management through aging-based task scheduling algorithms for smart grids. In: Proceedings of the 2019 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT’19), pp. 1–5 (2019). https://doi.org/10.1109/ISGT.2019.8791657

  45. Stavrinides, G.L., Karatza, H.D.: Scheduling real-time bag-of-tasks applications with approximate computations in SaaS clouds. Concurr. Comp. Pract. Exp. 32(1), 4208 (2020). https://doi.org/10.1002/cpe.4208

    Article  Google Scholar 

  46. Oldfield, R.A., Arunagiri, S., Teller, P.J., Seelam, S., Varela, M.R., Riesen, R., Roth, P.C.: Modeling the impact of checkpoints on next-generation systems. In: Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies (MSST’07), pp. 30–46 (2007). https://doi.org/10.1109/MSST.2007.4367962

  47. Stavrinides, G.L., Karatza, H.D.: Fault-tolerant orchestration of bags-of-tasks with application-directed checkpointing in a distributed environment. In: Proceedings of the 2021 International Conference on Communications, Computing, Cybersecurity and Informatics (CCCI’21), pp. 1–6 (2021). https://doi.org/10.1109/CCCI52664.2021.9583187

  48. Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12(10), 1094–1104 (2001). https://doi.org/10.1109/71.963420

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and the preparation of the manuscript. H. D. Karatza implemented the simulation program and conducted the simulation experiments. All authors read, reviewed, and approved the final manuscript.

Corresponding author

Correspondence to Georgios L. Stavrinides.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karatza, H.D., Stavrinides, G.L. Resource allocation and aging priority-based scheduling of linear workflow applications with transient failures and selective imprecise computations. Cluster Comput (2024). https://doi.org/10.1007/s10586-023-04249-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-023-04249-7

Keywords

Navigation