StoreRush: An Application-Level Approach to Harvesting Idle Storage in a Best Effort Environment

Qing Liu, Norbert Podhorszki, Jong Choi, Jeremy Logan, Matt Wolf, Scott Klasky, Tahsin Kurc, Xubin He

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

For a production HPC system where storage devices are shared between multiple applications and managed in a best effort manner, contention is often a major problem leading to some storage devices being more loaded than others and causing a significant reduction in I/O throughput. In this paper, we describe our latest efforts StoreRush to resolve this practical issue at the application level without requiring modification to the file and storage system. The proposed scheme uses a two-level messaging system to harvest idle storage via re-routing I/O requests to a less congested storage location so that write performance is improved while limiting the impact on read by throttling re-routing if deemed too much. An analytical model is derived to guide the setup of optimal throttling factor. The proposed scheme is verified against production applications Pixie3D, XGC1 and QMCPack during production windows, which very well demonstrated the effectiveness (e.g., up to 1.8x improvement in write) and scalability of our approach (up to 131,072 cores).

Original languageEnglish (US)
Pages (from-to)475-484
Number of pages10
JournalProcedia Computer Science
Volume108
DOIs
StatePublished - 2017
EventInternational Conference on Computational Science ICCS 2017 - Zurich, Switzerland
Duration: Jun 12 2017Jun 14 2017

All Science Journal Classification (ASJC) codes

  • General Computer Science

Keywords

  • High Performance Computing
  • I/O
  • Storage

Fingerprint

Dive into the research topics of 'StoreRush: An Application-Level Approach to Harvesting Idle Storage in a Best Effort Environment'. Together they form a unique fingerprint.

Cite this