Taming i/o variation on qos-less hpc storage: What can applications do?

Zhenbo Qiao, Qing Liu, Norbert Podhorszki, Scott Klasky, Jieyang Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As high-performance computing (HPC) is being scaled up to exascale to accommodate new modeling and simulation needs, I/O has continued to be a major bottleneck in the end-to-end scientific processes. Nevertheless, prior work in this area mostly aimed to maximize the average performance, and there has been a lack of study and solutions that can manage I/O performance variation on HPC systems. This work aims to take advantage of the storage characteristics and explore application level solutions that are interference-aware. In particular, we monitor the performance of data analytics and estimate the state of shared storage resources using discrete fourier transform (DFT). If heavy I/O interference is predicted to occur at a given timestep, data analytics can dynamically adapt to the environment by lowering the accuracy and performing partial or no augmentation from the shared storage, dictated by an augmentation-bandwidth plot. We evaluate three data analytics, XGC, GenASiS, and Jet, on Chameleon, and quantitatively demonstrate that both the average and variation of I/O performance can be vastly improved using our dynamic augmentation, with the mean and variance improved by as much as 67% and 96%, respectively, while maintaining acceptable outcome of data analysis.

Original languageEnglish (US)
Title of host publicationProceedings of SC 2020
Subtitle of host publicationInternational Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9781728199986
DOIs
StatePublished - Nov 2020
Event2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020 - Virtual, Atlanta, United States
Duration: Nov 9 2020Nov 19 2020

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume2020-November
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020
Country/TerritoryUnited States
CityVirtual, Atlanta
Period11/9/2011/19/20

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Keywords

  • High performance computing
  • data analysis
  • data storage

Fingerprint

Dive into the research topics of 'Taming i/o variation on qos-less hpc storage: What can applications do?'. Together they form a unique fingerprint.

Cite this