On MapReduce Scheduling in Hadoop Yarn on Heterogeneous Clusters

Meng Wang, Chase Q. Wu, Huiyan Cao, Yang Liu, Yongqiang Wang, Aiqin Hou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

Hadoop is a distributed computing system widely used for big data processing in various domains. As the data volume continues to increase rapidly, Hadoop systems have become a critical contributor to the success of many big data applications. The MapReduce scheduler is a key component that determines the overall performance of a Hadoop cluster. In this paper, we formulate and investigate a task scheduling problem in a heterogeneous Hadoop cluster to minimize the completion time of a batch of MapReduce jobs. We first design a prediction model to predict the end time of a task, which is used for placing the corresponding data block on a node in advance to reduce the data transmission time and the overall job completion time. Based on this prediction model, we propose a task matching-based scheduling algorithm, referred to as TMSA, to schedule the tasks in the task queue in Hadoop, by taking into account the real-time performance of each node in the cluster and the matching degree between nodes and tasks. Experimental results show that the prediction model achieves high accuracy and TMSA significantly reduces the completion time of a batch of MapReduce jobs compared to existing schedulers.

Original languageEnglish (US)
Title of host publicationProceedings - 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications and 12th IEEE International Conference on Big Data Science and Engineering, Trustcom/BigDataSE 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1747-1754
Number of pages8
ISBN (Print)9781538643877
DOIs
StatePublished - Sep 5 2018
Event17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications and 12th IEEE International Conference on Big Data Science and Engineering, Trustcom/BigDataSE 2018 - New York, United States
Duration: Jul 31 2018Aug 3 2018

Publication series

NameProceedings - 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications and 12th IEEE International Conference on Big Data Science and Engineering, Trustcom/BigDataSE 2018

Other

Other17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications and 12th IEEE International Conference on Big Data Science and Engineering, Trustcom/BigDataSE 2018
Country/TerritoryUnited States
CityNew York
Period7/31/188/3/18

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Keywords

  • Hadoop
  • MapReduce
  • YARN
  • distributed computing
  • task scheduler

Fingerprint

Dive into the research topics of 'On MapReduce Scheduling in Hadoop Yarn on Heterogeneous Clusters'. Together they form a unique fingerprint.

Cite this