Abstract
Clouds are becoming an effective platform for scientific workflow applications. In the meantime, Cloud computing structures are moving towards being more heterogeneous. In heterogeneous service-oriented systems, managing the reliability of resources (e.g., processors and communication networks) is widely identified as a critical issue due to processor and communication failures affecting user quality of service requirements. Therefore, these types of failures should be taken into account when scheduling algorithms. The present paper proposes a scheduling approach which includes four algorithms for minimizing the workflow execution cost while also meeting the user-specified deadline and reliability. To meet the application’s requirements, the first algorithm partitions the workflow into several clusters based on a critical parent called CbCP. After that, the resource assignment algorithm, consisting of reliability and deadline distribution methods, satisfies the application’s constraints. Experimental outcomes on various workflows, generated at different scales in real and random fashion, demonstrate that the proposed heuristics meet the deadline and reliability. This ensures the minimal cost when performing a similar quality of service as opposed to the performance of the state-of-the-art DRR and QFEC+ algorithms.
Similar content being viewed by others
References
Cai Z, Li X, Gupta JND (2016) Heuristics for provisioning services to workflows in XaaS clouds. IEEE Trans Serv Comput 9(2):250–263
Zhu X, Wang J, Guo H, Zhu D, Yang LT, Liu L (2016) Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds. IEEE Trans Parallel Distrib Syst 99:1
Zhou A et al (2016) Cloud service reliability enhancement via virtual machine placement optimization. IEEE Trans Serv Comput 10(6):902–913
Zhao L, Ren Y, Sakurai K (2013) Reliable workflow scheduling with less resource redundancy. Parallel Comput 39(10):567–585
Qiu W, Zheng Z, Wang X, Yang X, Lyu MR (2014) Reliability-based design optimization for cloud migration. IEEE Trans Serv Comput 7(2):223–236
Silic M, Delac G, Srbljic S (2015) Prediction of atomic web services reliability for QoS-aware recommendation. IEEE Trans Serv Comput 8(3):425–438
Bajaj R, Agrawal DP (2004) Improving scheduling of tasks in a heterogeneous environment. IEEE Trans Parallel Distrib Syst 15(2):107–118
Daoud MI, Kharma N (2008) A high performance algorithm for static task scheduling in heterogeneous distributed computing systems. J Parallel Distrib Comput 68(4):399–409
Wieczorek M, Hoheisel A, Prodan R (2009) Towards a general model of the multi-criteria workflow scheduling on the grid. Future Gener Comput Syst 25:237–256
Yu J, Kirley M, Buyya R (2007) Multi-objective planning for workflow execution on Grids. In: Proceedings of the 8th IEEE/ACM international conference on grid computing, pp 10–17
Benoit A, Hakem M, Robert Y (2008) Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In: IPDPS Miami 2008—proceedings of the 22nd IEEE International symposium on parallel and distributed processing CD-ROM, vol 33, no. December 2007
Benoit A, Hakem M, Robert Y (2009) Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems. Parallel Comput 35(2):83–108
Girault A, Kalla H (2009) A novel bicriteria scheduling heuristics providing a guaranteed global system failure rate. IEEE Trans Dependable Secur Comput 6(4):241–254
Zheng Q, Veeravalli B (2009) On the design of communication-aware fault-tolerant scheduling algorithms for precedence constrained tasks in grid computing systems with dedicated communication devices. J Parallel Distrib Comput 69(3):282–294
Zheng Q, Veeravalli B, Tham CK (2009) On the design of fault-tolerant scheduling strategies using primary-backup approach for computational grids with low replication costs. IEEE Trans Comput 58(3):380–393
**e G, Zeng G, Li R, Li K (2017) Quantitative fault-tolerance for reliable workflows on heterogeneous IaaS clouds. IEEE Trans Cloud Comput. https://doi.org/10.1109/TCC.2017.2780098
Arabnejad H, Barbosa JG (2014) A budget constrained scheduling algorithm for workflow applications. J Grid Comput 12(4):665–679
Sakellariou R, Zhao H, Tsiakkouri E, Dikaiakos MD (2007) Scheduling workflows with budget constraints. In: Integrated research in GRID computing CoreGRID integration workshop 2005 Selected Papers, pp 189–202
Su S, Li J, Huang Q, Huang X, Shuang K, Wang J (2013) Cost-efficient task scheduling for executing large programs in the cloud. Parallel Comput 39(4–5):177–188
Szabo C, Kroeger T (2012) Evolving multi-objective strategies for task allocation of scientific workflows on public clouds. In: 2012 IEEE congress on evolutionary computation, CEC 2012, pp 10–15
Kianpisheh S, Charkari NM (2014) A grid workflow quality-of-service estimation based on resource availability prediction. J Supercomput 67(2):496–527
**e G et al (2017) Minimizing redundancy to satisfy reliability requirement for a parallel application on heterogeneous service-oriented systems. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2017.2665552
Qin X, Jiang H, Swanson DR (2002) An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems. In: parallel processing. 2002. Proceedings international conference on, pp 360–368
Benoit A, Hakem M, Robert Y (2009) Optimizing the latency of streaming applications under throughput and reliability constraint. In: Proceedings international conference on parallel processing, pp 325–332
Zhao L, Ren Y, Sakurai K (2011) A resource minimizing scheduling algorithm with ensuring the deadline and reliability in heterogeneous systems. In: Proceedings of the international conference on advanced information networking and applications AINA, pp 275–282
Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-Science: an overview of workflow system features and capabilities. Future Gener Comput Syst 25(5):528–540
Naghibzadeh M (2016) Modeling and scheduling hybrid workflows of tasks and task interaction graphs on the cloud. Future Gener Comput Syst 65:33–45
Benoit A, Canon LC, Jeannot E, Robert Y (2012) Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms. J Sched 15(5):615–627
Deldari A, Naghibzadeh M, Abrishami S (2017) CCA: a deadline-constrained workflow scheduling algorithm for multicore resources on the cloud. J Supercomput 73:756–781
Bittencourt LF, Madeira ERM (2010) Towards the scheduling of multiple workflows on computational grids. J Grid Comput 8:419–441
Bittencourt LF, Madeira ERM (2008) A performance-oriented adaptive scheduler for dependent tasks on grids. Concurr Comput Pract Exp 20:1029–1049
Topcuoglu H, Hariri S (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13:260–274
Girault A, Kalla H, Sighireanu M, Sorel Y (2003) An algorithm for automatically obtaining distributed and fault-tolerant static schedules. In: Proceeding of international conference on dependable systems and networks. pp 159–168
Ranaweera S, Agrawal DP (2000) A task duplication based scheduling algorithm for heterogeneous systems. In: Proceedings 14th international parallel and distributed processing symposium 2000. IPDPS 2000. pp 445–450
Bharathi S, Chervenak A, Deelman E, Mehta G, Su MH, Vahi K (2008) Characterization of scientific workflows. In: 2008 3rd Work. Work. Support large-scale sci. work. no. June 2014
Acknowledgements
The authors would like to express their gratitude to the anonymous reviewers for their constructive comments which have helped to improve the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mousavi Nik, S.S., Naghibzadeh, M. & Sedaghat, Y. Cost-driven workflow scheduling on the cloud with deadline and reliability constraints. Computing 102, 477–500 (2020). https://doi.org/10.1007/s00607-019-00740-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-019-00740-5