Unbalanced Parallel I/O: An Often-Neglected Side Effect of Lossy Scientific Data Compression

Xinying Wang, Lipeng Wan, Jieyang Chen, Qian Gong, Ben Whitney, Jinzhen Wang, Ana Gainaru, Qing Liu, Norbert Podhorszki, Dongfang Zhao, Feng Yan, Scott Klasky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Lossy compression techniques have demonstrated promising results in significantly reducing the scientific data size while guaranteeing the compression error bounds. However, one important yet often neglected side effect of lossy scientific data compression is its impact on the performance of parallel I/O. Our key observation is that the compressed data size is often highly skewed across processes in lossy scientific compression. To understand this behavior, we conduct extensive experiments where we apply three lossy compressors MGARD, ZFP, and SZ, which are specifically designed and optimized for scientific data, to three real-world scientific applications Gray-Scott simulation, WarpX, and XGC. Our analysis result demonstrates that the size of the compressed data is always skewed even if the original data is evenly decomposed among processes. Such skewness widely exists in different scientific applications using different compressors as long as the information density of the data varies across processes. We then systematically study how this side effect of lossy scientific data compression impacts the performance of parallel I/O. We observe that the skewness in the sizes of the compressed data often leads to I/O imbalance, which can significantly reduce the efficiency of I/O bandwidth utilization if not properly handled. In addition, writing data concurrently to a single shared file through MPI-IO library is more sensitive to the unbalanced I/O loads. Therefore, we believe our research community should pay more attention to the unbalanced parallel I/O caused by lossy scientific data compression.

Original languageEnglish (US)
Title of host publicationProceedings of DRBSD-7 2021
Subtitle of host publication7th International Workshop on Data Analysis and Reduction for Big Scientific Data, Held in conjunction with SC 2021: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages26-32
Number of pages7
ISBN (Electronic)9781728186726
DOIs
StatePublished - 2021
Event7th International Workshop on Data Analysis and Reduction for Big Scientific Data, DRBSD-7 2021 - St. Louis, United States
Duration: Nov 14 2021 → …

Publication series

NameProceedings of DRBSD-7 2021: 7th International Workshop on Data Analysis and Reduction for Big Scientific Data, Held in conjunction with SC 2021: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference7th International Workshop on Data Analysis and Reduction for Big Scientific Data, DRBSD-7 2021
Country/TerritoryUnited States
CitySt. Louis
Period11/14/21 → …

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Information Systems
  • Information Systems and Management
  • Statistics, Probability and Uncertainty
  • Media Technology

Fingerprint

Dive into the research topics of 'Unbalanced Parallel I/O: An Often-Neglected Side Effect of Lossy Scientific Data Compression'. Together they form a unique fingerprint.

Cite this