Can I/O variability be reduced on QoS-Less HPC storage systems?

Dan Huang, Qing Liu, Jong Choi, Norbert Podhorszki, Scott Klasky, Jeremy Logan, George Ostrouchov, Xubin He, Matthew Wolf

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

For a production high-performance computing (HPC) system, where storage devices are shared between multiple applications and managed in a best effort manner, I/O contention is often a major problem. In this paper, we propose a balanced messaging-based re-routing in conjunction with throttling at the middleware level. This work tackles two key challenges that have not been fully resolved in the past: whether I/O variability can be reduced on a QoS-less HPC storage system, and how to design a runtime scheduling system that can scale up to a large amount of cores. The proposed scheme uses a two-level messaging system to re-route I/O requests to a less congested storage location so that write performance is improved, while limiting the impact on read by throttling re-routing. An analytical model is derived to guide the setup of optimal throttling factor. We thoroughly analyze the virtual messaging layer overhead and explore whether the in-transit buffering is effective in managing I/O variability. Contrary to the intuition, in-transit buffer cannot completely solve the problem. It can reduce the absolute variability but not the relative variability. The proposed scheme is verified against a synthetic benchmark as well as being used by production applications.

Original languageEnglish (US)
Article number8540017
Pages (from-to)631-645
Number of pages15
JournalIEEE Transactions on Computers
Volume68
Issue number5
DOIs
StatePublished - May 1 2019

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Keywords

  • High-performance computing
  • quality of service
  • storage
  • variability

Fingerprint Dive into the research topics of 'Can I/O variability be reduced on QoS-Less HPC storage systems?'. Together they form a unique fingerprint.

Cite this