Cross-layer Scheduling for MapReduce-based Big Data Workflows in Heterogeneous Hadoop Systems

Yijie Zhang, Chase Q. Wu, Aiqin Hou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The performance of big data workflows depends on both the workflow mapping scheme, which determines task assignment and container allocation in Hadoop, and the on-node scheduling policy, which governs resource allocation and container provisioning. Most research on big data workflow scheduling focuses solely on workflow mapping, achieving only limited success. We conduct an in-depth investigation into the impact of node-level scheduling on overall workflow performance and explore the benefits of combining these two levels of scheduling (workflow- and node-level). We formulate a generic problem that considers cross-layer scheduling to minimize the end-to-end delay of MapReduce-based big data workflows in the Hadoop system. The efficacy of our proposed solution, compared with existing methods, is demonstrated through extensive simulations and proof-of-concept experiments using real-life big data workflows deployed on a real-life cluster.

Original languageEnglish (US)
Title of host publication2025 International Conference on Computing, Networking and Communications, ICNC 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages350-355
Number of pages6
ISBN (Electronic)9798331520960
DOIs
StatePublished - 2025
Externally publishedYes
Event2025 International Conference on Computing, Networking and Communications, ICNC 2025 - Honolulu, United States
Duration: Feb 17 2025Feb 20 2025

Publication series

Name2025 International Conference on Computing, Networking and Communications, ICNC 2025

Conference

Conference2025 International Conference on Computing, Networking and Communications, ICNC 2025
Country/TerritoryUnited States
CityHonolulu
Period2/17/252/20/25

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Signal Processing
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Keywords

  • Big data
  • Hadoop
  • cross-layer scheduling

Fingerprint

Dive into the research topics of 'Cross-layer Scheduling for MapReduce-based Big Data Workflows in Heterogeneous Hadoop Systems'. Together they form a unique fingerprint.

Cite this