Abstract
A wide range of applications in distributed environments have a linear structure, varying priorities, and may experience transient software failures. As the computational demands of such linear workflow (LW) jobs continue to grow, their efficient, fair, and fault-tolerant resource allocation and scheduling is becoming more challenging. To address this problem, we propose a fair and efficient scheduling approach, which considers that the priorities of the jobs age with time. We jointly use this scheduling strategy with three practical routing techniques, as well as two variants of an application-directed checkpointing scheme. The first variant of this scheme incorporates imprecise computations in a selective manner, whereas the second one does not use imprecise computations at all. Our aim is to dynamically allocate and schedule LW jobs with different priorities and transient software failures in a distributed system. Through extensive experimentation, we evaluate the system performance under the considered routing methods and checkpointing schemes, utilizing various load cases and failure probabilities. The simulation results showcase the impact of selective imprecise computations on the system performance, while providing insights into how the examined routing strategies perform in each of the investigated scenarios.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Figc_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Figd_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig15_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig16_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig17_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04249-7/MediaObjects/10586_2023_4249_Fig18_HTML.png)
Similar content being viewed by others
References
Razaque, A., Jararweh, Y., Alotaibi, B., Alotaibi, M., Almiani, M.: Hybrid energy-efficient algorithm for efficient internet of things deployment. Sustain. Comput. Inf. Syst. 35, 100715 (2022). https://doi.org/10.1016/j.suscom.2022.100715
Chen, Y., De Luca, G.: Service-Oriented Computing and System Integration: Software, IoT, Big Data, and AI as Services, 8th edn. Kendall Hunt Publishing, Dubuque (2022)
Furfaro, A., Felicetti, C., Saccà, D., Crupi, F.: Accountability of IoT Devices. Internet of Things, 1st edn., pp. 133–147. Springer, Cham (2023)
Furfaro, A., Piccolo, A., Parise, A., Argento, L., Saccà, D.: A cloud-based platform for the emulation of complex cybersecurity scenarios. Futur. Gener. Comput. Syst. 89, 791–803 (2018). https://doi.org/10.1016/j.future.2018.07.025
Hamdan, S., Almajali, S., Ayyash, M., Bany Salameh, H., Jararweh, Y.: An intelligent edge-enabled distributed multi-task learning architecture for large-scale IoT-based cyber-physical systems. Simul. Model. Pract. Theor. 122, 102685 (2023). https://doi.org/10.1016/j.simpat.2022.102685
De Luca, G., Chen, Y.: Explainable artificial intelligence for workflow verification in visual IoT/robotics programming language environment. J. Artif. Intell. Technol. 1(1), 21–27 (2020). https://doi.org/10.37965/jait.2020.0023
Makani, S., Pittala, R., Alsayed, E., Aloqaily, M., Jararweh, Y.: A survey of blockchain applications in sustainable and smart cities. Cluster Comput. 25(6), 3915–3936 (2022). https://doi.org/10.1007/s10586-022-03625-z
Wu, Q., Gu, Y.: Performance analysis and optimization of linear workflows in heterogeneous network environments. In: Computer Communications and Networks, 1st edn, pp. 89–120. Springer, London (2011)
Kumar Dehury, C., Jakovits, P., Narayana Srirama, S., Giotis, G., Garg, G.: TOSCAdata: modeling data pipeline applications in TOSCA. J. Syst. Softw. 186, 111164 (2022). https://doi.org/10.1016/j.jss.2021.111164
Stavrinides, G.L., Karatza, H.D.: The impact of data locality on the performance of a SaaS cloud with real-time data-intensive applications. In: Proceedings of the 21st IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT’17), pp. 1–8 (2017). https://doi.org/10.1109/DISTRA.2017.8167683
Li, Y., Xu, N., Lyu, Q.: Construction of a knee osteoarthritis diagnostic system based on X-ray image processing. Cluster Comput. 22(6), 15533–15540 (2019). https://doi.org/10.1007/s10586-018-2677-y
Shi, C., Xu, C., He, J., Chen, Y., Cheng, Y., Yang, Q., Qiu, H.: Graph-based convolution feature aggregation for retinal vessel segmentation. Simul. Model. Pract. Theor. 121, 102653 (2022). https://doi.org/10.1016/j.simpat.2022.102653
Dautov, R., Distefano, S.: Stream processing on clustered edge devices. IEEE Trans. Cloud Comput. 10(2), 885–898 (2022). https://doi.org/10.1109/TCC.2020.2983402
Dohi, T., Zheng, J., Okamura, H., Trivedi, K.S.: Optimal periodic software rejuvenation policies based on interval reliability criteria. Reliab. Eng. Syst. Saf. 180, 463–475 (2018). https://doi.org/10.1016/j.ress.2018.08.009
Lin, K.J., Natarajan, S., Liu, J.W.S.: Imprecise results: utilizing partial computations in real-time systems. In: Proceedings of the 8th IEEE Real-Time Systems Symposium (RTSS’87), pp. 210–217 (1987)
Wu, X., Zhang, K., Jerry: An aggressive non-preemptive real-time scheduling using imprecise computing. In: Proceedings of the 2023 IEEE International Conference on Integrated Circuits and Communication Systems (ICICACS’23), pp. 1–7 (2023). https://doi.org/10.1109/ICICACS57338.2023.10100094
Stavrinides, G.L., Karatza, H.D.: Resource allocation and scheduling of linear workflow applications with ageing priorities and transient failures. In: Proceedings of the 19th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA’22), pp. 1–8 (2022). https://doi.org/10.1109/AICCSA56895.2022.10017928
Najafizadeh, A., Salajegheh, A., Rahmani, A.M., Sahafi, A.: Multi-objective task scheduling in cloud-fog computing using goal programming approach. Cluster Comput. 25(1), 141–165 (2022). https://doi.org/10.1007/s10586-021-03371-8
Zikos, S., Karatza, H.D.: Communication cost effective scheduling policies of nonclairvoyant jobs with load balancing in a grid. J. Syst. Softw. 82(12), 2103–2116 (2009). https://doi.org/10.1016/j.jss.2009.07.006
Ajitha, K.M., Indra, N.C.: Fisher linear discriminant and discrete global swarm based task scheduling in cloud environment. Cluster Comput. 25(5), 3145–3160 (2022). https://doi.org/10.1007/s10586-021-03509-8
Karaoglanoglou, K., Karatza, H.: Resource discovery in a grid system: directing requests to trustworthy virtual organizations based on global trust values. J. Syst. Softw. 84(3), 465–478 (2011). https://doi.org/10.1016/j.jss.2010.10.043
Choudhary, A., Govil, M.C., Singh, G., Awasthi, L.K., Pilli, E.S.: Energy-aware scientific workflow scheduling in cloud environment. Cluster Comput. 25(6), 3845–3874 (2022). https://doi.org/10.1007/s10586-022-03613-3
Papazachos, Z.C., Karatza, H.D.: The impact of task service time variability on gang scheduling performance in a two-cluster system. Simul. Model. Pract. Theor. 17(7), 1276–1289 (2009). https://doi.org/10.1016/j.simpat.2009.05.002
Belgacem, A., Beghdad-Bey, K., Nacer, H., Bouznad, S.: Efficient dynamic resource allocation method for cloud computing environment. Cluster Comput. 23(4), 2871–2889 (2020). https://doi.org/10.1007/s10586-020-03053-x
Stavrinides, G.L., Karatza, H.D.: Orchestrating real-time IoT workflows in a fog computing environment utilizing partial computations with end-to-end error propagation. Cluster Comput. 24(4), 3629–3650 (2021). https://doi.org/10.1007/s10586-021-03327-y
Fan, L., Liu, X., Li, X., Yuan, D., Xu, J.: Graph4Edge: a graph-based computation offloading strategy for mobile-edge workflow applications. In: Proceedings of the 2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops’20), pp. 1–4 (2020). https://doi.org/10.1109/PerComWorkshops48775.2020.9156270
Ait Aba, M., Zaourar, L., Munier, A.: Approximation algorithm for scheduling a chain of tasks on heterogeneous systems. In: Proceedings of the 23rd International European Conference on Parallel and Distributed Computing (Euro-Par’17), Parallel Processing Workshops, pp. 353–365 (2017). https://doi.org/10.1007/978-3-319-75178-8_29
Benoit, A., Nicod, J., Rehn-Sonigo, V.: Optimizing buffer sizes for pipeline workflow scheduling with setup times. In: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW’14), pp. 662–670 (2014). https://doi.org/10.1109/IPDPSW.2014.77
Khojasteh-Toussi, G., Naghibzadeh, M.: A divide and conquer approach to deadline constrained cost-optimization workflow scheduling for the cloud. Cluster Comput. 24(3), 1711–1733 (2021). https://doi.org/10.1007/s10586-020-03223-x
Agrawal, K., Benoit, A., Robert, Y.: Map** linear workflows with computation/communication overlap. In: Proceedings of the 14th IEEE International Conference on Parallel and Distributed Systems (ICPADS’08), pp. 195–202 (2008). https://doi.org/10.1109/ICPADS.2008.107
Agrawal, K., Benoit, A., Magnan, L., Robert, Y.: Scheduling algorithms for linear workflow optimization. In: Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS’10), pp. 1–12 (2010). https://doi.org/10.1109/IPDPS.2010.5470346
Schlatow, J., Ernst, R.: Response-time analysis for task chains in communicating threads. In: Proceedings of the 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’16), pp. 1–10 (2016). https://doi.org/10.1109/RTAS.2016.7461359
Stavrinides, G.L., Karatza, H.D.: Multicriteria scheduling of linear workflows with dynamically varying structure on distributed platforms. Simul. Model. Pract. Theor. 112, 102369 (2021). https://doi.org/10.1016/j.simpat.2021.102369
Siavvas, M., Gelenbe, E.: Optimum checkpoints for programs with loops. Simul. Model. Pract. Theor. 97, 101951 (2019). https://doi.org/10.1016/j.simpat.2019.101951
Benoit, A., Cavelan, A., Robert, Y., Sun, H.: Multi-level checkpointing and silent error detection for linear workflows. J. Comput. Sci. 28, 398–415 (2018). https://doi.org/10.1016/j.jocs.2017.03.024
Han, L., Canon, L.C., Casanova, H., Robert, Y., Vivien, F.: Checkpointing workflows for fail-stop errors. IEEE Trans. Comput. 67(8), 1105–1120 (2018). https://doi.org/10.1109/TC.2018.2801300
Feng, W.C., Liu, J.W.S.: Algorithms for scheduling real-time tasks with input error and end-to-end deadlines. IEEE Trans. Softw. Eng. 23(2), 93–106 (1997). https://doi.org/10.1109/32.585499
Esmaili, A., Nazemi, M., Pedram, M.: Energy-aware scheduling of task graphs with imprecise computations and end-to-end deadlines. ACM Trans. Des. Autom. Electron. Syst. 25(1), 11–11121 (2019). https://doi.org/10.1145/3365999
Stavrinides, G.L., Karatza, H.D.: Scheduling linear workflows with dynamically adjustable exit tasks on distributed resources. In: Proceedings of the IEEE 15th International Symposium on Autonomous Decentralized Systems (ISADS’23), pp. 1–8 (2023). https://doi.org/10.1109/ISADS56919.2023.10092151
Yao, S., Hao, Y., Zhao, Y., Shao, H., Liu, D., Liu, S., Wang, T., Li, J., Abdelzaher, T.: Scheduling real-time deep learning services as imprecise computations. In: Proceedings of the IEEE 26th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’20), pp. 1–10 (2020). https://doi.org/10.1109/RTCSA50079.2020.9203676
Stavrinides, G.L., Karatza, H.D.: Fault-tolerant gang scheduling in distributed real-time systems utilizing imprecise computations. Simulation 85(8), 525–536 (2009). https://doi.org/10.1177/0037549709340729
Stavrinides, G.L., Karatza, H.D.: Scheduling real-time parallel applications in SaaS clouds in the presence of transient software failures. In: Proceedings of the 2016 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS’16), pp. 1–8 (2016). https://doi.org/10.1109/SPECTS.2016.7570524
Stavrinides, G.L., Karatza, H.D.: The impact of checkpointing interval selection on the scheduling performance of real-time fine-grained parallel applications in SaaS clouds under various failure probabilities. Concurr. Comp. Pract. Exp. 30(12), 4288 (2018). https://doi.org/10.1002/cpe.4288
Mohammadi, F.D., Heh, D.: Power management through aging-based task scheduling algorithms for smart grids. In: Proceedings of the 2019 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT’19), pp. 1–5 (2019). https://doi.org/10.1109/ISGT.2019.8791657
Stavrinides, G.L., Karatza, H.D.: Scheduling real-time bag-of-tasks applications with approximate computations in SaaS clouds. Concurr. Comp. Pract. Exp. 32(1), 4208 (2020). https://doi.org/10.1002/cpe.4208
Oldfield, R.A., Arunagiri, S., Teller, P.J., Seelam, S., Varela, M.R., Riesen, R., Roth, P.C.: Modeling the impact of checkpoints on next-generation systems. In: Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies (MSST’07), pp. 30–46 (2007). https://doi.org/10.1109/MSST.2007.4367962
Stavrinides, G.L., Karatza, H.D.: Fault-tolerant orchestration of bags-of-tasks with application-directed checkpointing in a distributed environment. In: Proceedings of the 2021 International Conference on Communications, Computing, Cybersecurity and Informatics (CCCI’21), pp. 1–6 (2021). https://doi.org/10.1109/CCCI52664.2021.9583187
Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12(10), 1094–1104 (2001). https://doi.org/10.1109/71.963420
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and the preparation of the manuscript. H. D. Karatza implemented the simulation program and conducted the simulation experiments. All authors read, reviewed, and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Karatza, H.D., Stavrinides, G.L. Resource allocation and aging priority-based scheduling of linear workflow applications with transient failures and selective imprecise computations. Cluster Comput (2024). https://doi.org/10.1007/s10586-023-04249-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10586-023-04249-7