Distributed Throughput Optimization for Large-Scale Scientific Workflows Under Fault-Tolerance Constraint

Yi Gu, Chase Qishi Wu, Xin Liu, Dantong Yu

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

With the advent of next-generation scientific applications, the workflow approach that integrates various computing and networking technologies has provided a viable solution to managing and optimizing large-scale distributed data transfer, processing, and analysis. This paper investigates a problem of mapping distributed scientific workflows for maximum throughput in faulty networks where nodes and links are subject to probabilistic failures. We formulate this problem as a bi-objective optimization problem to maximize both throughput and reliability. By adapting and modifying a centralized fault-free workflow mapping scheme, we propose a new mapping algorithm to achieve high throughput for smooth data flow in a distributed manner while satisfying a pre-specified bound of the overall failure rate for a guaranteed level of reliability. The performance superiority of the proposed solution is illustrated by both extensive simulation-based comparisons with existing algorithms and experimental results from a real-life scientific workflow deployed in wide-area networks.

Original languageEnglish (US)
Pages (from-to)361-379
Number of pages19
JournalJournal of Grid Computing
Volume11
Issue number3
DOIs
StatePublished - Sep 2013

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture
  • Computer Networks and Communications

Keywords

  • Distributed algorithm
  • Fault tolerance
  • Throughput
  • Workflow mapping

Fingerprint

Dive into the research topics of 'Distributed Throughput Optimization for Large-Scale Scientific Workflows Under Fault-Tolerance Constraint'. Together they form a unique fingerprint.

Cite this