Improving throughput and reliability of distributed scientific workflows for streaming data processing

Yi Gu, Qishi Wu, Xin Liu, Dantong Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

With the advent of next-generation scientific applications, the workflow-based computing technology has become an indispensable research method for managing and streamlining large-scale distributed data processing. This paper investigates a problem of mapping distributed workflows for streaming data processing in faulty networks where nodes and links are subject to probabilistic failures. We formulate this problem as a bi-objective optimization problem in terms of both throughput and reliability, and propose a decentralized layer-oriented method to achieve high throughput for smooth data flow while satisfying a prespecified overall failure rate bound for a guaranteed level of reliability. The superiority of the proposed mapping solution is illustrated by both extensive simulation-based performance comparisons with existing algorithms and experimental results from a real-life scientific workflow deployed in wide-area networks.

Original languageEnglish (US)
Title of host publicationProc.- 2011 IEEE International Conference on HPCC 2011 - 2011 IEEE International Workshop on FTDCS 2011 - Workshops of the 2011 Int. Conf. on UIC 2011- Workshops of the 2011 Int. Conf. ATC 2011
Pages347-354
Number of pages8
DOIs
StatePublished - 2011
Externally publishedYes
Event13th IEEE International Workshop on FTDCS 2011, the 8th International Conference on ATC 2011, the 8th International Conference on UIC 2011 and the 13th IEEE International Conference on HPCC 2011 - Banff, AB, Canada
Duration: Sep 2 2011Sep 4 2011

Publication series

NameProc.- 2011 IEEE International Conference on HPCC 2011 - 2011 IEEE International Workshop on FTDCS 2011 -Workshops of the 2011 Int. Conf. on UIC 2011- Workshops of the 2011 Int. Conf. ATC 2011

Other

Other13th IEEE International Workshop on FTDCS 2011, the 8th International Conference on ATC 2011, the 8th International Conference on UIC 2011 and the 13th IEEE International Conference on HPCC 2011
CountryCanada
CityBanff, AB
Period9/2/119/4/11

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications

Keywords

  • Reliability
  • distributed computing
  • fault tolerance
  • frame rate
  • workflow mapping

Fingerprint Dive into the research topics of 'Improving throughput and reliability of distributed scientific workflows for streaming data processing'. Together they form a unique fingerprint.

Cite this