ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization

Eric R. Schendel, Saurabh V. Pendse, John Jenkins, David A. Boyuka, Zhenhuan Gong, Sriram Lakshminarasimhan, Qing Liu, Hemanth Kolla, Jackie Chen, Scott Klasky, Robert Ross, Nagiza F. Samatova

Research output: Chapter in Book/Report/Conference proceedingConference contribution

28 Scopus citations

Abstract

Current peta-scale data analytics frameworks suffer from a significant performance bottleneck due to an imbalance between their enormous computational power and limited I/O bandwidth. Using data compression schemes to reduce the amount of I/O activity is a promising approach to addressing this problem. In this paper, we propose a hybrid framework for interleaving I/O with data compression to achieve improved I/O throughput side-by-side with reduced dataset size. We evaluate several interleaving strategies, present theoretical models, and evaluate the efficiency and scalability of our approach through comparative analysis. With our theoretical model, considering 19 real-world scientific datasets both from the public domain and peta-scale simulations, we estimate that the hybrid method can result in a 12 to 46% increase in throughput on hard-to-compress scientific datasets. At the reported peak bandwidth of 60 GB/s of uncompressed data for a current, leadership-class parallel I/O system, this translates into an effective gain of 7 to 28 GB/s in aggregate throughput.

Original languageEnglish (US)
Title of host publicationHPDC '12 - Proceedings of the 21st ACM Symposium on High-Performance Parallel and Distributed Computing
Pages61-72
Number of pages12
DOIs
StatePublished - 2012
Externally publishedYes
Event21st ACM Symposium on High-Performance Parallel and Distributed Computing, HPDC '12 - Delft, Netherlands
Duration: Jun 18 2012Jun 22 2012

Publication series

NameHPDC '12 - Proceedings of the 21st ACM Symposium on High-Performance Parallel and Distributed Computing

Other

Other21st ACM Symposium on High-Performance Parallel and Distributed Computing, HPDC '12
Country/TerritoryNetherlands
CityDelft
Period6/18/126/22/12

All Science Journal Classification (ASJC) codes

  • Software

Keywords

  • High Performance Computing
  • Hybrid Interleaving
  • I/O
  • ISOBAR
  • Lossless Compression
  • Staging

Fingerprint

Dive into the research topics of 'ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization'. Together they form a unique fingerprint.

Cite this