Abstract
Over the past few years, data production has increased significantly due to the growth of Internet-dependent technologies. Big data allows for an evolving paradigm change in data discovery and use. Big data is processed using MapReduce framework in a scalable and distributed manner. For performance improvement of job scheduling across the nodes in Hadoop cluster is an optimization problem. Scheduling algorithm is proposed to optimize the MapReduce jobs by reducing the budget and execution time of cloud models. Experiments for the proposed method have been carried out on word count and sessionization application of web server log file with different size. Experimental results show that the average completion time is reduced in the proposed method when compared to FIFO and fair scheduler.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00638-0/MediaObjects/42979_2021_638_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00638-0/MediaObjects/42979_2021_638_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00638-0/MediaObjects/42979_2021_638_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-021-00638-0/MediaObjects/42979_2021_638_Fig4_HTML.png)
Similar content being viewed by others
References
Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on operating systems design & implementation—volume 6, pages 10–10; 2004. USENIX Association.
Hadoop. 2012. http://lucene.apache.org/hadoop. Accessed 30 Jun 2012.
Hadoop Distributed File System. 2019. http://hadoop.apache.org/hdfs.
Zaharia M, Konwinski A, Joseph A, Zatz Y, Stoica I. Improving map reduce performance in heterogeneous environments. In: OSDI’08: 8th USENIX symposium on operating systems design and implementation; 2008.
Park J, Lee D et al. Locality-aware dynamic VM reconfiguration on mapreduce clouds. In: 21st International ACM symposium on high-performance parallel and distributed computing (HPDC’12); 2012.
Li Z, Shen Y, Yao B, Guo M. OFScheduler: a dynamic network optimizer for MapReduce in heterogeneous cluster. Berlin: Springer; 2013.
Tao Gu, Zuo C, et al. Improving MapReduce performance by data prefetching in heterogeneous or shared environments. Int J Grid Distrib Comput. 2013;6(5):71–82.
Sun X. An enhanced self-adaptive map reduce scheduling algorithm. In: Master Thesis, University of Nebraska, Lincoln; 2012.
Mozakka M, Esfahani FS, NadimiI MH. Survey on adaptive job schedulers in MapReduce. J Theor Appl Inf Technol. 2014;66:3.
Chen Q, Guo M, et al. HAT: history-based auto-tuning Map Reduce in heterogeneous environments. J Supercomput. 2011;2011:1–17.
Jacob JP, Basu A, et al. Performance analysis of Hadoop map reduce on eucalyptus private cloud. Int J Comput Appl. 2013;79:17.
Ibrahim S, ** H et al. Maestro: replica-aware map scheduling for map reduce. In: Cluster, cloud and grid computing (CCGRID), 2012 12th IEEE/ACM International Symposium; 2012. pp. 435–42.
KC K, Anyanwu K. Scheduling hadoop jobs to meet deadlines. In: Paper presented at the cloud computing Technology and science (CloudCom); 2010.
Nita M-C, Pop F, Voicu C, Dobre C, Xhafa F. MOMTH: multi-objective scheduling algorithm of many tasks in Hadoop. Clust Comput. 2015;18:1–14.
Zhang W, Rajasekaran S, Wood T, Zhu M. Mimp: deadline and interference aware scheduling of Hadoop virtual machines. In: Paper presented at the Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium; 2014.
Wang Y, Shi W. Budget-driven scheduling algorithms for batches of mapreduce jobs in heterogeneous clouds. IEEE Trans Cloud Comput. 2014;2(3):306–19. https://doi.org/10.1109/TCC.2014.2316812.
Hussain H, Malik SUR, Hameed A, Khan SU, Bickler G, Min-Allah N, et al. A survey on resource allocation in high performance distributed computing systems. Parallel Comput. 2013;39(11):709–36.
Sharma S, Tim US, Wong J, Gadia S, Sharma S. A brief review on leading big data models. Data Sci J. 2014;13:138–57.
Sharma S, Shandilya R, Patnaik S, Mahapatra A. A leading NoSQL models for handling Big Data: a brief review. Int J Business Inf Syst Indersci. 2015;2015:6.
Sharma S, Tim US, Gadia S, Wong J. Proliferating cloud density through big data ecosystem, novel XCLOUDX classification and emergence of as-a-service era; 2015.
Zhang A et al. A distributed cache for Hadoop distributed file system in real-time cloud services. In: Grid computing, 2012 ACM/IEEE 13th International Conference on. IEEE; 2012.
Senthikumar K, et al. Performance enhancement of data processing using multiple intelligent cache in Hadoop. Int J Innovat Eng Technol. 2014;4(1):2319–1058.
Crume et al. Compressing intermediate keys between mappers and reducers in SciHadoop. In: IEEE SC companion: high performance computing, networking storage and analysis; 2013.
Lin A, Schatz M. Design patterns for efficient graph algorithms in MapReduce. In: Proceedings of the eighth workshop on mining and learning with graphs, ACM; 2010.
Shahrivari S. Beyond batch processing: towards real-time and streaming big data. Computers. 2014;3(4):117–29.
Chowdhury M, et al. Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Comput Commun Rev. 2011;41(4):98–109.
Petrovic J. Using Memcached for data distribution in industrial environment. In: Third international conference on systems (ICONS08); 2008. pp. 358–72.
Lee W-H, Jun H-G, Kim H-J. Hadoop MapReduce performance enhancement using IN-Node combiners. Int J Comput Sci Inf Technol. 2015;7:5.
Ke H, Li P, Guo S, Stojmenovic I. Aggregation on the fly: reducing traffic for big data in the cloud. IEEE Netw. 2015;29(5):17–23.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have not received grants from any company. D. C. Vinutha declares that he/she has no conflict of interest. G. T. Raju declares that he/she has no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Data Science and Communication” guest-edited by Kamesh Namudri, Naveen Chilamkurti, Sushma S J and S. Padmashree.