With the scaling up of simulation-based scientific discovery on high-performance computing systems, the disparity between compute and I/O has increased, forcing domain scientists to save only a small amount of simulation data to persistent storage. This can result in the loss of essential physics fields that are needed for data analysis. While error-bounded lossy compression has made tremendous progress in bridging the gap between compute and I/O, the lack of understanding of compression performance remains a key hurdle to its wide adoption. In this work, we present zPerf, a statistical gray-box performance modeling approach for scientific lossy compression. Our contributions are threefold: 1) We develop zPerf to estimate the performance of lossy compression techniques, based on in-depth understanding and statistical modeling for data features and core compression metrics; 2) We demonstrate the in-detailed implementation of zPerf using two case studies, where we derive the performance modeling for SZ and ZFP, two leading lossy compressors; 3) We evaluate the effectiveness of zPerf on real-world datasets across various domains. Based on the evaluation, we demonstrate the efficacy of the zPerf performance model; 4) We further discuss three case studies where zPerf is applied to extrapolate the compression ratio of SZ and ZFP with alternative encoding schemes as well as ZFP with an alternative transform scheme. Through the case studies, we demonstrate the potential of zPerf for exploring the design space of lossy compression, which has hardly been studied in the literature.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics
- Lossy compression