SpotMPI: A framework for auction-based HPC computing using amazon spot instances

Moussa Taifi, Justin Y. Shi, Abdallah Khreishah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

23 Scopus citations

Abstract

The economy of scale offers cloud computing virtually unlimited cost effective processing potentials. Theoretically, prices under fair market conditions should reflect the most reasonable costs of computations. The fairness is ensured by the mutual agreements between the sellers and the buyers. Resource use efficiency is automatically optimized in the process. While there is no lack of incentives for the cloud provider to offer auction-based computing platform, using these volatile platform for practical computing is a challenge for existing programming paradigms. This paper reports a methodology and a toolkit designed to tame the challenges for MPI applications. Unlike existing MPI fault tolerance tools, we emphasize on dynamically adjusted optimal checkpoint-restart (CPR) intervals. We introduce a formal model, then a HPC application toolkit, named SpotMPI, to facilitate the practical execution of real MPI applications on volatile auction-based cloud platforms. Our models capture the intrinsic dependencies between critical time consuming elements by leveraging instrumented performance parameters and publicly available resource bidding histories. We study algorithms with different computing v.s. communication complexities. Our results show non-trivial insights into the optimal bidding and application scaling strategies.

Original languageEnglish (US)
Title of host publicationAlgorithms and Architectures for Parallel Processing - 11th International Conference, ICA3PP 2011, Proceedings
Pages109-120
Number of pages12
EditionPART 2
DOIs
StatePublished - 2011
Externally publishedYes
Event11th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2011 - Melbourne, VIC, Australia
Duration: Oct 24 2011Oct 26 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume7017 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other11th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2011
Country/TerritoryAustralia
CityMelbourne, VIC
Period10/24/1110/26/11

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'SpotMPI: A framework for auction-based HPC computing using amazon spot instances'. Together they form a unique fingerprint.

Cite this