Abstract
Many scientific applications feature large-scale workflows consisting of computing modules that must be strategically deployed and executed in distributed environments. The end-to-end performance of such scientific workflows depends on both the mapping scheme that determines module assignment, and the scheduling policy that determines resource allocation if multiple modules are mapped to the same node. These two aspects of workflow optimization are traditionally treated as two separated topics, and the interactions between them have not been fully explored by any existing efforts. As the scale of scientific workflows and the complexity of network environments rapidly increase, each individual aspect of performance optimization alone can only meet with limited success. We conduct an in-depth investigation into workflow execution dynamics in distributed environments and formulate a generic problem that considers both workflow mapping and task scheduling to minimize the end-to-end delay of workflows. We propose an integrated solution, referred to as Mapping and Scheduling Interaction (MSI), to improve the workflow performance. The efficacy of MSI is illustrated by both extensive simulations and proof-of-concept experiments using real-life scientific workflows for climate modeling on a PC cluster.
Original language | English (US) |
---|---|
Pages (from-to) | 51-64 |
Number of pages | 14 |
Journal | Journal of Parallel and Distributed Computing |
Volume | 84 |
DOIs | |
State | Published - Aug 8 2015 |
All Science Journal Classification (ASJC) codes
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computer Networks and Communications
- Artificial Intelligence
Keywords
- End-to-end delay
- On-node scheduling
- Scientific workflows
- Workflow mapping