Tuple switching network - When slower may be better

Justin Y. Shi, Moussa Taifi, Abdallah Khreishah, Jie Wu

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

This paper reports an application dependent network design for extreme scale high performance computing (HPC) applications. Traditional scalable network designs focus on fast point-to-point transmission of generic data packets. The proposed network focuses on the sustainability of high performance computing applications by statistical multiplexing of semantic data objects. For HPC applications using data-driven parallel processing, a tuple is a semantic object. We report the design and implementation of a tuple switching network for data parallel HPC applications in order to gain performance and reliability at the same time when adding computing and communication resources. We describe a sustainability model and a simple computational experiment to demonstrate extreme scale application's sustainability with decreasing system mean time between failures (MTBF). Assuming three times slowdown of statistical multiplexing and 35% time loss per checkpoint, a two-tier tuple switching framework would produce sustained performance and energy savings for extreme scale HPC application using more than 1024 processors or less than 6 hour MTBF. Higher processor counts or higher checkpoint overheads accelerate the benefits.

Original languageEnglish (US)
Pages (from-to)1521-1534
Number of pages14
JournalJournal of Parallel and Distributed Computing
Volume72
Issue number11
DOIs
StatePublished - Nov 1 2012
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Keywords

  • Application dependent networking
  • Sustainable high performance computing

Fingerprint Dive into the research topics of 'Tuple switching network - When slower may be better'. Together they form a unique fingerprint.

Cite this