Log in

Budget Constraint Scheduler for Big Data Using Hadoop MapReduce

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Over the past few years, data production has increased significantly due to the growth of Internet-dependent technologies. Big data allows for an evolving paradigm change in data discovery and use. Big data is processed using MapReduce framework in a scalable and distributed manner. For performance improvement of job scheduling across the nodes in Hadoop cluster is an optimization problem. Scheduling algorithm is proposed to optimize the MapReduce jobs by reducing the budget and execution time of cloud models. Experiments for the proposed method have been carried out on word count and sessionization application of web server log file with different size. Experimental results show that the average completion time is reduced in the proposed method when compared to FIFO and fair scheduler.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on operating systems design & implementation—volume 6, pages 10–10; 2004. USENIX Association.

  2. Hadoop. 2012. http://lucene.apache.org/hadoop. Accessed 30 Jun 2012.

  3. Hadoop Distributed File System. 2019. http://hadoop.apache.org/hdfs.

  4. Zaharia M, Konwinski A, Joseph A, Zatz Y, Stoica I. Improving map reduce performance in heterogeneous environments. In: OSDI’08: 8th USENIX symposium on operating systems design and implementation; 2008.

  5. Park J, Lee D et al. Locality-aware dynamic VM reconfiguration on mapreduce clouds. In: 21st International ACM symposium on high-performance parallel and distributed computing (HPDC’12); 2012.

  6. Li Z, Shen Y, Yao B, Guo M. OFScheduler: a dynamic network optimizer for MapReduce in heterogeneous cluster. Berlin: Springer; 2013.

    Google Scholar 

  7. Tao Gu, Zuo C, et al. Improving MapReduce performance by data prefetching in heterogeneous or shared environments. Int J Grid Distrib Comput. 2013;6(5):71–82.

    Article  Google Scholar 

  8. Sun X. An enhanced self-adaptive map reduce scheduling algorithm. In: Master Thesis, University of Nebraska, Lincoln; 2012.

  9. Mozakka M, Esfahani FS, NadimiI MH. Survey on adaptive job schedulers in MapReduce. J Theor Appl Inf Technol. 2014;66:3.

    Google Scholar 

  10. Chen Q, Guo M, et al. HAT: history-based auto-tuning Map Reduce in heterogeneous environments. J Supercomput. 2011;2011:1–17.

    Google Scholar 

  11. Jacob JP, Basu A, et al. Performance analysis of Hadoop map reduce on eucalyptus private cloud. Int J Comput Appl. 2013;79:17.

    Google Scholar 

  12. Ibrahim S, ** H et al. Maestro: replica-aware map scheduling for map reduce. In: Cluster, cloud and grid computing (CCGRID), 2012 12th IEEE/ACM International Symposium; 2012. pp. 435–42.

  13. KC K, Anyanwu K. Scheduling hadoop jobs to meet deadlines. In: Paper presented at the cloud computing Technology and science (CloudCom); 2010.

  14. Nita M-C, Pop F, Voicu C, Dobre C, Xhafa F. MOMTH: multi-objective scheduling algorithm of many tasks in Hadoop. Clust Comput. 2015;18:1–14.

    Article  Google Scholar 

  15. Zhang W, Rajasekaran S, Wood T, Zhu M. Mimp: deadline and interference aware scheduling of Hadoop virtual machines. In: Paper presented at the Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium; 2014.

  16. Wang Y, Shi W. Budget-driven scheduling algorithms for batches of mapreduce jobs in heterogeneous clouds. IEEE Trans Cloud Comput. 2014;2(3):306–19. https://doi.org/10.1109/TCC.2014.2316812.

    Article  Google Scholar 

  17. Hussain H, Malik SUR, Hameed A, Khan SU, Bickler G, Min-Allah N, et al. A survey on resource allocation in high performance distributed computing systems. Parallel Comput. 2013;39(11):709–36.

    Article  MathSciNet  Google Scholar 

  18. Sharma S, Tim US, Wong J, Gadia S, Sharma S. A brief review on leading big data models. Data Sci J. 2014;13:138–57.

    Article  Google Scholar 

  19. Sharma S, Shandilya R, Patnaik S, Mahapatra A. A leading NoSQL models for handling Big Data: a brief review. Int J Business Inf Syst Indersci. 2015;2015:6.

    Google Scholar 

  20. Sharma S, Tim US, Gadia S, Wong J. Proliferating cloud density through big data ecosystem, novel XCLOUDX classification and emergence of as-a-service era; 2015.

  21. Zhang A et al. A distributed cache for Hadoop distributed file system in real-time cloud services. In: Grid computing, 2012 ACM/IEEE 13th International Conference on. IEEE; 2012.

  22. Senthikumar K, et al. Performance enhancement of data processing using multiple intelligent cache in Hadoop. Int J Innovat Eng Technol. 2014;4(1):2319–1058.

    Google Scholar 

  23. Crume et al. Compressing intermediate keys between mappers and reducers in SciHadoop. In: IEEE SC companion: high performance computing, networking storage and analysis; 2013.

  24. Lin A, Schatz M. Design patterns for efficient graph algorithms in MapReduce. In: Proceedings of the eighth workshop on mining and learning with graphs, ACM; 2010.

  25. Shahrivari S. Beyond batch processing: towards real-time and streaming big data. Computers. 2014;3(4):117–29.

    Article  Google Scholar 

  26. Chowdhury M, et al. Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Comput Commun Rev. 2011;41(4):98–109.

    Article  MathSciNet  Google Scholar 

  27. Petrovic J. Using Memcached for data distribution in industrial environment. In: Third international conference on systems (ICONS08); 2008. pp. 358–72.

  28. Lee W-H, Jun H-G, Kim H-J. Hadoop MapReduce performance enhancement using IN-Node combiners. Int J Comput Sci Inf Technol. 2015;7:5.

    Google Scholar 

  29. Ke H, Li P, Guo S, Stojmenovic I. Aggregation on the fly: reducing traffic for big data in the cloud. IEEE Netw. 2015;29(5):17–23.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. C. Vinutha.

Ethics declarations

Conflict of interest

The authors have not received grants from any company. D. C. Vinutha declares that he/she has no conflict of interest. G. T. Raju declares that he/she has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Data Science and Communication” guest-edited by Kamesh Namudri, Naveen Chilamkurti, Sushma S J and S. Padmashree.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vinutha, D.C., Raju, G.T. Budget Constraint Scheduler for Big Data Using Hadoop MapReduce. SN COMPUT. SCI. 2, 250 (2021). https://doi.org/10.1007/s42979-021-00638-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-021-00638-0

Keywords

Navigation