Budget Constraint Scheduler for Big Data Using Hadoop MapReduce

Vinutha, D. C.; Raju, G. T.

doi:10.1007/s42979-021-00638-0

Budget Constraint Scheduler for Big Data Using Hadoop MapReduce

Original Research
Published: 30 April 2021

Volume 2, article number 250, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

521 Accesses
4 Citations
Explore all metrics

Abstract

Over the past few years, data production has increased significantly due to the growth of Internet-dependent technologies. Big data allows for an evolving paradigm change in data discovery and use. Big data is processed using MapReduce framework in a scalable and distributed manner. For performance improvement of job scheduling across the nodes in Hadoop cluster is an optimization problem. Scheduling algorithm is proposed to optimize the MapReduce jobs by reducing the budget and execution time of cloud models. Experiments for the proposed method have been carried out on word count and sessionization application of web server log file with different size. Experimental results show that the average completion time is reduced in the proposed method when compared to FIFO and fair scheduler.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review of Scheduling Algorithms in Hadoop

Multi-objective scheduling of MapReduce jobs in big data processing

Article 03 May 2017

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

Article 19 March 2020

References

Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on operating systems design & implementation—volume 6, pages 10–10; 2004. USENIX Association.
Hadoop. 2012. http://lucene.apache.org/hadoop. Accessed 30 Jun 2012.
Hadoop Distributed File System. 2019. http://hadoop.apache.org/hdfs.
Zaharia M, Konwinski A, Joseph A, Zatz Y, Stoica I. Improving map reduce performance in heterogeneous environments. In: OSDI’08: 8th USENIX symposium on operating systems design and implementation; 2008.
Park J, Lee D et al. Locality-aware dynamic VM reconfiguration on mapreduce clouds. In: 21st International ACM symposium on high-performance parallel and distributed computing (HPDC’12); 2012.
Li Z, Shen Y, Yao B, Guo M. OFScheduler: a dynamic network optimizer for MapReduce in heterogeneous cluster. Berlin: Springer; 2013.
Google Scholar
Tao Gu, Zuo C, et al. Improving MapReduce performance by data prefetching in heterogeneous or shared environments. Int J Grid Distrib Comput. 2013;6(5):71–82.
Article Google Scholar
Sun X. An enhanced self-adaptive map reduce scheduling algorithm. In: Master Thesis, University of Nebraska, Lincoln; 2012.
Mozakka M, Esfahani FS, NadimiI MH. Survey on adaptive job schedulers in MapReduce. J Theor Appl Inf Technol. 2014;66:3.
Google Scholar
Chen Q, Guo M, et al. HAT: history-based auto-tuning Map Reduce in heterogeneous environments. J Supercomput. 2011;2011:1–17.
Google Scholar
Jacob JP, Basu A, et al. Performance analysis of Hadoop map reduce on eucalyptus private cloud. Int J Comput Appl. 2013;79:17.
Google Scholar
Ibrahim S, ** H et al. Maestro: replica-aware map scheduling for map reduce. In: Cluster, cloud and grid computing (CCGRID), 2012 12th IEEE/ACM International Symposium; 2012. pp. 435–42.
KC K, Anyanwu K. Scheduling hadoop jobs to meet deadlines. In: Paper presented at the cloud computing Technology and science (CloudCom); 2010.
Nita M-C, Pop F, Voicu C, Dobre C, Xhafa F. MOMTH: multi-objective scheduling algorithm of many tasks in Hadoop. Clust Comput. 2015;18:1–14.
Article Google Scholar
Zhang W, Rajasekaran S, Wood T, Zhu M. Mimp: deadline and interference aware scheduling of Hadoop virtual machines. In: Paper presented at the Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium; 2014.
Wang Y, Shi W. Budget-driven scheduling algorithms for batches of mapreduce jobs in heterogeneous clouds. IEEE Trans Cloud Comput. 2014;2(3):306–19. https://doi.org/10.1109/TCC.2014.2316812.
Article Google Scholar
Hussain H, Malik SUR, Hameed A, Khan SU, Bickler G, Min-Allah N, et al. A survey on resource allocation in high performance distributed computing systems. Parallel Comput. 2013;39(11):709–36.
Article MathSciNet Google Scholar
Sharma S, Tim US, Wong J, Gadia S, Sharma S. A brief review on leading big data models. Data Sci J. 2014;13:138–57.
Article Google Scholar
Sharma S, Shandilya R, Patnaik S, Mahapatra A. A leading NoSQL models for handling Big Data: a brief review. Int J Business Inf Syst Indersci. 2015;2015:6.
Google Scholar
Sharma S, Tim US, Gadia S, Wong J. Proliferating cloud density through big data ecosystem, novel XCLOUDX classification and emergence of as-a-service era; 2015.
Zhang A et al. A distributed cache for Hadoop distributed file system in real-time cloud services. In: Grid computing, 2012 ACM/IEEE 13th International Conference on. IEEE; 2012.
Senthikumar K, et al. Performance enhancement of data processing using multiple intelligent cache in Hadoop. Int J Innovat Eng Technol. 2014;4(1):2319–1058.
Google Scholar
Crume et al. Compressing intermediate keys between mappers and reducers in SciHadoop. In: IEEE SC companion: high performance computing, networking storage and analysis; 2013.
Lin A, Schatz M. Design patterns for efficient graph algorithms in MapReduce. In: Proceedings of the eighth workshop on mining and learning with graphs, ACM; 2010.
Shahrivari S. Beyond batch processing: towards real-time and streaming big data. Computers. 2014;3(4):117–29.
Article Google Scholar
Chowdhury M, et al. Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Comput Commun Rev. 2011;41(4):98–109.
Article MathSciNet Google Scholar
Petrovic J. Using Memcached for data distribution in industrial environment. In: Third international conference on systems (ICONS08); 2008. pp. 358–72.
Lee W-H, Jun H-G, Kim H-J. Hadoop MapReduce performance enhancement using IN-Node combiners. Int J Comput Sci Inf Technol. 2015;7:5.
Google Scholar
Ke H, Li P, Guo S, Stojmenovic I. Aggregation on the fly: reducing traffic for big data in the cloud. IEEE Netw. 2015;29(5):17–23.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of CS&E(Artificial Intelligence & Machine Learning), Vidyavardhaka College of Engineering, Mysore, India
D. C. Vinutha
Visvesvaraya Technological University, Belagavi, Karnataka, India
D. C. Vinutha & G. T. Raju
Department of Computer Science and Engineering, S. J. C. Institute of Technology, Chickballapur, India
G. T. Raju

Authors

D. C. Vinutha
View author publications
You can also search for this author in PubMed Google Scholar
G. T. Raju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. C. Vinutha.

Ethics declarations

Conflict of interest

The authors have not received grants from any company. D. C. Vinutha declares that he/she has no conflict of interest. G. T. Raju declares that he/she has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Data Science and Communication” guest-edited by Kamesh Namudri, Naveen Chilamkurti, Sushma S J and S. Padmashree.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vinutha, D.C., Raju, G.T. Budget Constraint Scheduler for Big Data Using Hadoop MapReduce. SN COMPUT. SCI. 2, 250 (2021). https://doi.org/10.1007/s42979-021-00638-0

Download citation

Received: 15 December 2020
Accepted: 08 April 2021
Published: 30 April 2021
DOI: https://doi.org/10.1007/s42979-021-00638-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Budget Constraint Scheduler for Big Data Using Hadoop MapReduce

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Review of Scheduling Algorithms in Hadoop

Multi-objective scheduling of MapReduce jobs in big data processing

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Budget Constraint Scheduler for Big Data Using Hadoop MapReduce

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Review of Scheduling Algorithms in Hadoop

Multi-objective scheduling of MapReduce jobs in big data processing

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation