Abstract
Nowadays, with the development of science and technology, as well as the increase of important data and transactions, maintaining these data and transactions has become a big challenge. On the one hand, their maintenance cost is a very important issue for organizations and companies, and on the other hand, their security and safety is a very important and sensitive issue because the occurrence of software faults, especially Byzantine faults, hardware faults and cyber-attacks, threaten data and transactions and the safety of systems. Therefore, researchers are trying to provide solutions that can provide the best service at the lowest cost according to the pay-as-you-go law and can maintain the security and health of data in the event of a fault. One of the most important techniques presented to increase fault tolerance in distributed systems is the use of replication methods, which besides being costly, have many problems. In this article, blockchain technology is used to achieve goals such as increasing reliability and availability, reducing resources, reducing costs, and increasing fault tolerance, especially Byzantine faults, and has achieved very good results compared to other methods.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-024-04279-9/MediaObjects/10586_2024_4279_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-024-04279-9/MediaObjects/10586_2024_4279_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-024-04279-9/MediaObjects/10586_2024_4279_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-024-04279-9/MediaObjects/10586_2024_4279_Fig4_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-024-04279-9/MediaObjects/10586_2024_4279_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-024-04279-9/MediaObjects/10586_2024_4279_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-024-04279-9/MediaObjects/10586_2024_4279_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-024-04279-9/MediaObjects/10586_2024_4279_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-024-04279-9/MediaObjects/10586_2024_4279_Fig9_HTML.png)
Similar content being viewed by others
References
Mallisetty, S.B., et al.: A Review on Cloud Security and Its Challenges. in 2023 International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT). IEEE. (2023)
Butt, U.A., et al.: Cloud security threats and solutions: A survey. Wireless Pers. Commun. 128(1), 387–413 (2023)
Asadova, F., et al.: A Survey of Usage of Anytime Algorithm in Fault detection in Cloud Systems. in 2023 IEEE 21st World Symposium on Applied Machine Intelligence and Informatics (SAMI). IEEE. (2023)
Liakath, J.A., Krishnadoss, P., Natesan, G.: DCCWOA: A multi-heuristic fault tolerant scheduling technique for cloud computing environment. Peer-to-Peer Netw. Appl., : p. 1–18. (2023)
Schlögl, T., Schmid, U.: A Sufficient Condition for Gaining Belief in Byzantine Fault-Tolerant Distributed Systems. ar**v preprint ar**v:2304.00389, (2023)
Hao, X., et al.: Dynamic practical byzantine fault tolerance. in. IEEE conference on communications and network security (CNS). 2018. IEEE. (2018)
Reghenzani, F., Guo, Z., Fornaciari, W.: Software Fault Tolerance in real-time Systems: Identifying the Future Research Questions. ACM Computing Surveys (2023)
Abeni, L., et al.: Fault tolerance in real-time cloud computing. in 2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC). IEEE. (2023)
Bakhshi, Z., Rodriguez-Navas, G., Hansson, H.: Fault-tolerant permanent storage for container-based fog architectures. in 2021 22nd IEEE International Conference on Industrial Technology (ICIT). IEEE. (2021)
Diouf, G.M., Elbiaze, H., Jaafar, W.: On byzantine fault tolerance in multi-master kubernetes clusters. Future Generation Computer Systems. 109, 407–419 (2020)
Jayasekara, S., Karunasekera, S., Harwood, A.: Optimizing checkpoint-based fault‐tolerance in distributed stream processing systems: Theory to practice. Software: Pract. Experience. 52(1), 296–315 (2022)
Zhou, D., Tamir, Y.: Hycor: Fault-tolerant replicated containers based on checkpoint and replay. ar**v preprint ar**v:2101.09584, (2021)
Marcotte, P., Grégoire, F., Petrillo, F.: Multiple fault-tolerance mechanisms in cloud systems: A systematic review. in 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE. (2019)
Mousavi Nik, S.S., Naghibzadeh, M., Sedaghat, Y.: Task replication to improve the reliability of running workflows on the cloud. Cluster Comput. 24, 343–359 (2021)
Mesbahi, M.R., Rahmani, A.M., Hosseinzadeh, M.: Reliability and high availability in cloud computing environments: A reference roadmap. Human-centric Comput. Inform. Sci. 8, 1–31 (2018)
Pandey, T.K., Singh, I., Kumar, M.: Replication in distributed systems and its improvements. Int. J. Curr. Microbiol. App Sci. 8(5), 446–451 (2019)
Shakarami, A., et al.: Data replication schemes in cloud computing: A survey. Cluster Comput. 24, 2545–2579 (2021)
Slimani, S., Hamrouni, T., Ben Charrada, F.: Service-oriented replication strategies for improving quality-of-service in cloud computing: A survey. Cluster Comput. 24, 361–392 (2021)
Chandrakala, H., Loganathan, R.: Efficient heuristic replication techniques for High Data availability in Cloud. Comput. Syst. Sci. Eng., 45(3). (2023)
Rajalakshmi, K., Sambath, M., Joseph, L.: Research Challenges and Future Directions for Data Storage in Cloud Computing Environment. in 2023 International Conference on Computer Communication and Informatics (ICCCI). IEEE. (2023)
Marcozzi, M., et al.: Availability Model for Byzantine Fault-Tolerant Systems. in International Conference on Advanced Information Networking and Applications. Springer. (2023)
Kit, N.K.K., Aibin, M.: Study on High Availability and Fault Tolerance. in 2023 International Conference on Computing, Networking and Communications (ICNC). IEEE. (2023)
Paul, J.J.: Disaster Recovery Architectures, in Distributed Serverless Architectures on AWS: Design and Implement Serverless Architectures, pp. 49–73. Springer (2023)
Ezechiel, K.K., Agarwal, R., Kaushik, B.: Synchronous and asynchronous replication. (2017)
Ghosh, R.K., Ghosh, H.: Distributed Systems: Theory and Applications. John Wiley & Sons (2023)
Altaf, A., et al.: A survey of blockchain technology: Architecture, applied domains, platforms, and security threats. Social Sci. Comput. Rev. 41(5), 1941–1962 (2023)
Sheth, H., Dattani, J.: Overview of blockchain technology. Asian Journal For Convergence In Technology (AJCT) ISSN-2350-1146, (2019)
Arias Maestro, A., et al.: Blockchain based cloud management architecture for maximum availability. (2023)
Talaver, V., Vakaliuk, T.A.: Reliable distributed systems: Review of modern approaches. J. Edge Comput. 2(1), 84–101 (2023)
Zheng, Z., et al.: An Overview of Blockchain Technology: Architecture, Consensus, and Future Trends. In 2017 IEEE International Congress on big data (BigData Congress). Ieee (2017)
Guo, H., Yu, X.: A Survey on Blockchain Technology and its security. Blockchain: Res. Appl. 3(2), 100067 (2022)
Sampaio, A.M., Barbosa, J.G.: A comparative cost analysis of fault-tolerance mechanisms for availability on the cloud. Sustainable Computing: Informatics and Systems. 19, 315–323 (2018)
Louati, T., Abbes, H., Cérin, C.: LXCloudFT: Towards high availability, fault tolerant cloud system based Linux containers. J. Parallel Distrib. Comput. 122, 51–69 (2018)
Louati, T., et al.: Lxcloud-cr: Towards linux containers distributed hash table based checkpoint-restart. J. Parallel Distrib. Comput. 111, 187–205 (2018)
Louati, T., et al.: Gc-cr: a decentralized garbage collector component for checkpointing in clouds. in 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE. (2017)
Nasibullin, A.R., Novikov, B.A.: Replication in distributed systems: Models, methods, and protocols. Program. Comput. Softw. 46, 341–350 (2020)
Nussbaum, L.: Usages et utilisateurs de Grid’5000: stratégie pour l’accès aux ressources. (2016)
Abbes, H., Louati, T., Cérin, C.: Dynamic replication factor model for Linux containers-based cloud systems. J. Supercomputing. 76, 7219–7241 (2020)
Chakraborty, S., Islam, S.H., Samanta, D.: Introduction to Data Mining and Knowledge Discovery, in Data Classification and Incremental Clustering in Data Mining and Machine Learning, pp. 1–22. Springer (2022)
Semmoud, A., et al.: A New Fault-Tolerant Algorithm based on replication and preemptive Migration in Cloud Computing. Int. J. Cloud Appl. Comput. (IJCAC). 12(1), 1–14 (2022)
Alimjon, D.: Problems of data replication in distribution systems. ACADEMICIA: An. International Multidisciplinary Research Journal. 12(5), 1119–1128 (2022)
Chen, B., Jiang, Z.M.: A survey of software log instrumentation. ACM Comput. Surv. (CSUR). 54(4), 1–34 (2021)
Pecchia, A., et al.: Industry practices and event logging: Assessment of a critical software development process. in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE. (2015)
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Peyman Bayat, Mehdi Farrokhbakht Foumani and Masoum Farahmandian. The first draft of the manuscript was written by Masoum Farahmandian and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.Idea of the article: Peyman BayatLiterature search and data analysis: Masoum FarahmandianCritical review of the work: Peyman Bayat, Mehdi Farrokhbakht FoumaniDraft of the work: Masoum Farahmandian.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Farahmandian, M., Foumani, M.F. & Bayat, P. Improving fault tolerance in LinuX container-based distributed systems using blockchain. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04279-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10586-024-04279-9