Log in

Improving fault tolerance in LinuX container-based distributed systems using blockchain

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Nowadays, with the development of science and technology, as well as the increase of important data and transactions, maintaining these data and transactions has become a big challenge. On the one hand, their maintenance cost is a very important issue for organizations and companies, and on the other hand, their security and safety is a very important and sensitive issue because the occurrence of software faults, especially Byzantine faults, hardware faults and cyber-attacks, threaten data and transactions and the safety of systems. Therefore, researchers are trying to provide solutions that can provide the best service at the lowest cost according to the pay-as-you-go law and can maintain the security and health of data in the event of a fault. One of the most important techniques presented to increase fault tolerance in distributed systems is the use of replication methods, which besides being costly, have many problems. In this article, blockchain technology is used to achieve goals such as increasing reliability and availability, reducing resources, reducing costs, and increasing fault tolerance, especially Byzantine faults, and has achieved very good results compared to other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (France)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Mallisetty, S.B., et al.: A Review on Cloud Security and Its Challenges. in 2023 International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT). IEEE. (2023)

  2. Butt, U.A., et al.: Cloud security threats and solutions: A survey. Wireless Pers. Commun. 128(1), 387–413 (2023)

    Article  Google Scholar 

  3. Asadova, F., et al.: A Survey of Usage of Anytime Algorithm in Fault detection in Cloud Systems. in 2023 IEEE 21st World Symposium on Applied Machine Intelligence and Informatics (SAMI). IEEE. (2023)

  4. Liakath, J.A., Krishnadoss, P., Natesan, G.: DCCWOA: A multi-heuristic fault tolerant scheduling technique for cloud computing environment. Peer-to-Peer Netw. Appl., : p. 1–18. (2023)

  5. Schlögl, T., Schmid, U.: A Sufficient Condition for Gaining Belief in Byzantine Fault-Tolerant Distributed Systems. ar**v preprint ar**v:2304.00389, (2023)

  6. Hao, X., et al.: Dynamic practical byzantine fault tolerance. in. IEEE conference on communications and network security (CNS). 2018. IEEE. (2018)

  7. Reghenzani, F., Guo, Z., Fornaciari, W.: Software Fault Tolerance in real-time Systems: Identifying the Future Research Questions. ACM Computing Surveys (2023)

  8. Abeni, L., et al.: Fault tolerance in real-time cloud computing. in 2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC). IEEE. (2023)

  9. Bakhshi, Z., Rodriguez-Navas, G., Hansson, H.: Fault-tolerant permanent storage for container-based fog architectures. in 2021 22nd IEEE International Conference on Industrial Technology (ICIT). IEEE. (2021)

  10. Diouf, G.M., Elbiaze, H., Jaafar, W.: On byzantine fault tolerance in multi-master kubernetes clusters. Future Generation Computer Systems. 109, 407–419 (2020)

    Article  Google Scholar 

  11. Jayasekara, S., Karunasekera, S., Harwood, A.: Optimizing checkpoint-based fault‐tolerance in distributed stream processing systems: Theory to practice. Software: Pract. Experience. 52(1), 296–315 (2022)

    Google Scholar 

  12. Zhou, D., Tamir, Y.: Hycor: Fault-tolerant replicated containers based on checkpoint and replay. ar**v preprint ar**v:2101.09584, (2021)

  13. Marcotte, P., Grégoire, F., Petrillo, F.: Multiple fault-tolerance mechanisms in cloud systems: A systematic review. in 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE. (2019)

  14. Mousavi Nik, S.S., Naghibzadeh, M., Sedaghat, Y.: Task replication to improve the reliability of running workflows on the cloud. Cluster Comput. 24, 343–359 (2021)

    Article  Google Scholar 

  15. Mesbahi, M.R., Rahmani, A.M., Hosseinzadeh, M.: Reliability and high availability in cloud computing environments: A reference roadmap. Human-centric Comput. Inform. Sci. 8, 1–31 (2018)

    Google Scholar 

  16. Pandey, T.K., Singh, I., Kumar, M.: Replication in distributed systems and its improvements. Int. J. Curr. Microbiol. App Sci. 8(5), 446–451 (2019)

    Article  Google Scholar 

  17. Shakarami, A., et al.: Data replication schemes in cloud computing: A survey. Cluster Comput. 24, 2545–2579 (2021)

    Article  Google Scholar 

  18. Slimani, S., Hamrouni, T., Ben Charrada, F.: Service-oriented replication strategies for improving quality-of-service in cloud computing: A survey. Cluster Comput. 24, 361–392 (2021)

    Article  Google Scholar 

  19. Chandrakala, H., Loganathan, R.: Efficient heuristic replication techniques for High Data availability in Cloud. Comput. Syst. Sci. Eng., 45(3). (2023)

  20. Rajalakshmi, K., Sambath, M., Joseph, L.: Research Challenges and Future Directions for Data Storage in Cloud Computing Environment. in 2023 International Conference on Computer Communication and Informatics (ICCCI). IEEE. (2023)

  21. Marcozzi, M., et al.: Availability Model for Byzantine Fault-Tolerant Systems. in International Conference on Advanced Information Networking and Applications. Springer. (2023)

  22. Kit, N.K.K., Aibin, M.: Study on High Availability and Fault Tolerance. in 2023 International Conference on Computing, Networking and Communications (ICNC). IEEE. (2023)

  23. Paul, J.J.: Disaster Recovery Architectures, in Distributed Serverless Architectures on AWS: Design and Implement Serverless Architectures, pp. 49–73. Springer (2023)

  24. Ezechiel, K.K., Agarwal, R., Kaushik, B.: Synchronous and asynchronous replication. (2017)

  25. Ghosh, R.K., Ghosh, H.: Distributed Systems: Theory and Applications. John Wiley & Sons (2023)

  26. Altaf, A., et al.: A survey of blockchain technology: Architecture, applied domains, platforms, and security threats. Social Sci. Comput. Rev. 41(5), 1941–1962 (2023)

    Article  Google Scholar 

  27. Sheth, H., Dattani, J.: Overview of blockchain technology. Asian Journal For Convergence In Technology (AJCT) ISSN-2350-1146, (2019)

  28. Arias Maestro, A., et al.: Blockchain based cloud management architecture for maximum availability. (2023)

  29. Talaver, V., Vakaliuk, T.A.: Reliable distributed systems: Review of modern approaches. J. Edge Comput. 2(1), 84–101 (2023)

    Article  Google Scholar 

  30. Zheng, Z., et al.: An Overview of Blockchain Technology: Architecture, Consensus, and Future Trends. In 2017 IEEE International Congress on big data (BigData Congress). Ieee (2017)

  31. Guo, H., Yu, X.: A Survey on Blockchain Technology and its security. Blockchain: Res. Appl. 3(2), 100067 (2022)

    Google Scholar 

  32. Sampaio, A.M., Barbosa, J.G.: A comparative cost analysis of fault-tolerance mechanisms for availability on the cloud. Sustainable Computing: Informatics and Systems. 19, 315–323 (2018)

    Google Scholar 

  33. Louati, T., Abbes, H., Cérin, C.: LXCloudFT: Towards high availability, fault tolerant cloud system based Linux containers. J. Parallel Distrib. Comput. 122, 51–69 (2018)

    Article  Google Scholar 

  34. Louati, T., et al.: Lxcloud-cr: Towards linux containers distributed hash table based checkpoint-restart. J. Parallel Distrib. Comput. 111, 187–205 (2018)

    Article  Google Scholar 

  35. Louati, T., et al.: Gc-cr: a decentralized garbage collector component for checkpointing in clouds. in 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE. (2017)

  36. Nasibullin, A.R., Novikov, B.A.: Replication in distributed systems: Models, methods, and protocols. Program. Comput. Softw. 46, 341–350 (2020)

    Article  MathSciNet  Google Scholar 

  37. Nussbaum, L.: Usages et utilisateurs de Grid’5000: stratégie pour l’accès aux ressources. (2016)

  38. Abbes, H., Louati, T., Cérin, C.: Dynamic replication factor model for Linux containers-based cloud systems. J. Supercomputing. 76, 7219–7241 (2020)

    Article  Google Scholar 

  39. Chakraborty, S., Islam, S.H., Samanta, D.: Introduction to Data Mining and Knowledge Discovery, in Data Classification and Incremental Clustering in Data Mining and Machine Learning, pp. 1–22. Springer (2022)

  40. Semmoud, A., et al.: A New Fault-Tolerant Algorithm based on replication and preemptive Migration in Cloud Computing. Int. J. Cloud Appl. Comput. (IJCAC). 12(1), 1–14 (2022)

    Google Scholar 

  41. Alimjon, D.: Problems of data replication in distribution systems. ACADEMICIA: An. International Multidisciplinary Research Journal. 12(5), 1119–1128 (2022)

    Google Scholar 

  42. Chen, B., Jiang, Z.M.: A survey of software log instrumentation. ACM Comput. Surv. (CSUR). 54(4), 1–34 (2021)

    Google Scholar 

  43. Pecchia, A., et al.: Industry practices and event logging: Assessment of a critical software development process. in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE. (2015)

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Peyman Bayat, Mehdi Farrokhbakht Foumani and Masoum Farahmandian. The first draft of the manuscript was written by Masoum Farahmandian and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.Idea of the article: Peyman BayatLiterature search and data analysis: Masoum FarahmandianCritical review of the work: Peyman Bayat, Mehdi Farrokhbakht FoumaniDraft of the work: Masoum Farahmandian.

Corresponding author

Correspondence to Mehdi Farrokhbakht Foumani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Farahmandian, M., Foumani, M.F. & Bayat, P. Improving fault tolerance in LinuX container-based distributed systems using blockchain. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04279-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-024-04279-9

Keywords

Navigation