A distributed workflow mapping algorithm for minimum end-to-end delay under fault-tolerance constraint

Qishi Wu, Yi Gu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Many large-scale scientific applications feature distributed computing workflows of complex structures that must be executed and transferred in shared wide-area networks consisting of unreliable nodes and links. Mapping these computing workflows in such faulty network environments for optimal latency while ensuring certain fault tolerance is crucial to the success of eScience that requires both performance and reliability. We construct analytical cost models and formulate workflow mapping as an optimization problem under failure rate constraint. We propose a distributed heuristic mapping solution based on recursive critical path to achieve minimum end-to-end delay and satisfy a pre-specified overall failure rate for a guaranteed level of fault tolerance. The performance superiority of the proposed mapping solution is illustrated by extensive simulation-based comparisons with existing mapping algorithms.

Original languageEnglish (US)
Title of host publicationProceedings - 16th International Conference on Parallel and Distributed Systems, ICPADS 2010
Pages508-515
Number of pages8
DOIs
StatePublished - Dec 1 2010
Externally publishedYes
Event16th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2010 - Shanghai, China
Duration: Dec 8 2010Dec 10 2010

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
ISSN (Print)1521-9097

Other

Other16th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2010
Country/TerritoryChina
CityShanghai
Period12/8/1012/10/10

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Keywords

  • Distributed algorithm
  • End-to-end delay
  • Fault tolerance
  • Scientific workflow

Fingerprint

Dive into the research topics of 'A distributed workflow mapping algorithm for minimum end-to-end delay under fault-tolerance constraint'. Together they form a unique fingerprint.

Cite this