Analysis and implementation of reactive fault tolerance techniques in Hadoop: a comparative study

Asghar, Hassan; Nazir, Babar

doi:10.1007/s11227-020-03491-9

Analysis and implementation of reactive fault tolerance techniques in Hadoop: a comparative study

Published: 04 January 2021

Volume 77, pages 7184–7210, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Hassan Asghar^1,2 &
Babar Nazir^1,3

960 Accesses
5 Citations
Explore all metrics

A Publisher Correction to this article was published on 11 February 2021

This article has been updated

Abstract

Hadoop is a state-of-the-art industry’s de facto tool for the computation of Big Data. Native fault tolerance procedure in Hadoop is dilatory and leads us towards performance degradation. Moreover, it is failed to completely consider the computational overhead and storage cost. On the other hand, the dynamic nature of MapReduce and complexity are also important parameters that affect the response time of the job. To achieve all this, it is essential to have a foolproof failure handling technique. In this paper, we have performed an analysis of notable fault tolerance techniques to see the impact of using different performance metrics under variable dataset with variable fault injections. The critical result shows that response timewise, the byzantine technique has a performance priority over the retrying and checkpointing technique in regards to killing one node failure. In addition, throughput wise, task-level byzantine fault tolerance technique once again had high priority as compared to checkpointing and retrying in terms of network disconnect failure. All in all, this comparative study highlights the strengths and weaknesses of different fault-tolerant techniques and is essential in determining the best technique in a given environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Research of Scheduling Strategy Based on Fault Tolerance in Hadoop Platform

Factory: Master Node High-Availability for Big Data Applications and Beyond

Improving the Map and Shuffle Phases in Hadoop MapReduce

Change history

11 February 2021
A Correction to this paper has been published: https://doi.org/10.1007/s11227-021-03651-5

References

** H, Ibrahim S, Qi L, Cao H, Wu S, Shi X (2011) The mapreduce programming model and implementations. In: Cloud computing: principles and paradigms, pp 373–390
Borthakur D et al (2008) Hdfs architecture guide. Hadoop Apache Project 53
Madani SA, Hayat K, Li H, Khan SU, Ranjan R, Khan IA, Kolodziej J, Nazir B, Chen D, Irfan R, Wang L, Bickler G (2013) Survey on social networking services. IET Netw 2(4):224–234
Article Google Scholar
Cowsalya T, Mugunthan S (2015) Hadoop architecture and fault tolerance based Hadoop clusters in geographically distributed data centre. ARPN J Eng Appl Sci 10(7):2818–2821
Google Scholar
Khan FG, Qureshi K, Nazir B (2010) Performance evaluation of fault tolerance techniques in grid computing system. Comput Electr Eng 36(6):1110–1122
Article Google Scholar
Dinu F, Ng T (2012) Understanding the effects and implications of compute node related failures in hadoop. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing. ACM, pp 187–198
Schroeder B, Gibson GA (2007) Understanding failures in petascale computers. In: Journal of Physics: Conference Series, vol 78, no 1. IOP Publishing, , p 012022
Dean J (2004) Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (San Francisco, CA, Dec. 6.8). Usenix Association, 2004
Subramanian S, Zhang Y, Vaidyanathan R, Gunawi HS, Arpaci-Dusseau AC, Arpaci-Dusseau RH, Naughton JF (2010) Impact of disk corruption on open-source DBMS. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE). IEEE, pp 509–520
Yang C, Yen C, Tan C, Madden SR (2010) Osprey: implementing MapReduce-style fault tolerance in a shared-nothing distributed database. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE). IEEE, pp 657–668
Faghri F, Bazarbayev S, Overholt M, Farivar R, Campbell RH, Sanders WH (2012) Failure scenario as a service (fsaas) for Hadoop clusters. In: Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management. ACM, p 5
Sangroya A, Serrano D, Bouchenak S (2012) MRBS: towards dependability benchmarking for Hadoop mapreduce. In: European Conference on Parallel Processing. Springer, Berlin, Heidelberg, pp 3–12
Malik S, Nazir B, Qureshi K, Khan IA (2013) A reliable checkpoint storage strategy for grid. Computing 95(7):611–632
Article Google Scholar
Quiane-Ruiz JA, Pinkel C, Schad J, Dittrich J (2011) RAFTing MapReduce: fast recovery on the RAFT. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE). IEEE, pp 589–600
Hu P, Dai W (2014) Enhancing fault tolerance based on Hadoop cluster. Int J Database Theory Appl 7(1):37–48
Article MathSciNet Google Scholar
Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219
Article Google Scholar
Soualhia M, Khomh F, Tahar S (2015) Atlas: an adaptive failure-aware scheduler for Hadoop. In: 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC). IEEE, pp 1–8
Costa P, Pasin M, Bessani AN, Correia M (2011) Byzantine fault-tolerant mapreduce: faults are not just crashes. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, pp 32–39
Liu Y, Wei W (2015) A replication-based mechanism for fault tolerance in mapreduce framework. In: Mathematical problems in engineering 2015
Mustafa S, Nazir B, Hayat A, Khan AR, Madani SA (2015) Resource management in cloud computing: taxonomy, prospects, and challenges. Comput Electr Eng 47:186–203
Article Google Scholar
Kuromatsu N, Okita M, Hagihara K (2013) Evolving fault tolerance in Hadoop with robust auto-recovering JobTracker. Bull Netw Comput Syst Softw 2(1):4
Google Scholar
Varghese LA, Sreejith V, Bose S (2014) Enhancing NameNode fault tolerance in Hadoop over cloud environment. In: 2014 6th International Conference on Advanced Computing (ICoAC). IEEE, pp 82–85
Song L, Wu S, Wang H, Yang Q (2014) Distributed mapreduce engine with fault tolerance. In: 2014 IEEE International Conference on Communications (ICC). IEEE, pp 3626–3630
Costa PA, Bai X, Ramos FM, Correia M (2016) Medusa: an efficient cloud fault-tolerant mapreduce. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, pp 443–452
Bala A, Chana I (2012) Fault tolerance-challenges, techniques and implementation in cloud computing. IJCSI Int J Comput Sci Issues 9(1):1694–1814
Google Scholar
Nazir B, Qureshi K, Manuel P (2009) Adaptive checkpointing strategy to tolerate faults in economy based grid. J Supercomput 50(1):1–18
Article Google Scholar
Vernica R, Balmin A, Beyer KS, Ercegovac V (2012) Adaptive mapreduce using situation-aware mappers. In: Proceedings of the 15th International Conference on Extending Database Technology. ACM, pp 420–431
Zhao D (2017) Performance comparison between Hadoop and HAMR under laboratory environment. Procedia Comput Sci 111:223–229
Article Google Scholar
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc VLDB Endow 5(12):1802–1813
Article Google Scholar
david78k, Jul 2013. david78k/anarchyape. https://github.com/david78k/anarchyape. Accessed 16 Jan 2017
Bouchenak S, Sangroya A (2016) MRBS—Hadoop MapReduce dependability and performance benchmarking. Mrbs.gforge.liris.cnrs.fr. https://mrbs.gforge.liris.cnrs.fr/um_configuring.php. Accessed 12 Nov 2017
Noll MG (2011) Michael G. Noll. Benchmarking and Stress Testing an Hadoop Cluster with TeraSort, TestDFSIO & Co.—Michael G. Noll. www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/. Accessed 16 Jan 2017

Download references

Author information

Authors and Affiliations

Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, KPK, Pakistan
Hassan Asghar & Babar Nazir
Department of Software and Communications Engineering, Hongik University, Sejong, South Korea
Hassan Asghar
Department of IT and Computer Science, Pak-Austria Fachhochschule: Institute of Applied Sciences and Technology, Haripur, KPK, Pakistan
Babar Nazir

Authors

Hassan Asghar
View author publications
You can also search for this author in PubMed Google Scholar
Babar Nazir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Babar Nazir.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Asghar, H., Nazir, B. Analysis and implementation of reactive fault tolerance techniques in Hadoop: a comparative study. J Supercomput 77, 7184–7210 (2021). https://doi.org/10.1007/s11227-020-03491-9

Download citation

Accepted: 23 October 2020
Published: 04 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11227-020-03491-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Analysis and implementation of reactive fault tolerance techniques in Hadoop: a comparative study

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Research of Scheduling Strategy Based on Fault Tolerance in Hadoop Platform

Factory: Master Node High-Availability for Big Data Applications and Beyond

Improving the Map and Shuffle Phases in Hadoop MapReduce

Change history

11 February 2021

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Analysis and implementation of reactive fault tolerance techniques in Hadoop: a comparative study

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Research of Scheduling Strategy Based on Fault Tolerance in Hadoop Platform

Factory: Master Node High-Availability for Big Data Applications and Beyond

Improving the Map and Shuffle Phases in Hadoop MapReduce

Change history

11 February 2021

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation