TY - JOUR
T1 - Compression Ratio Modeling and Estimation across Error Bounds for Lossy Compression
AU - Wang, Jinzhen
AU - Liu, Tong
AU - Liu, Qing
AU - He, Xubin
AU - Luo, Huizhang
AU - He, Weiming
N1 - Funding Information:
The authors wish to acknowledge the support from the US NSF under Grant No. CCF-1718297, CCF-1812861, and NJIT research startup fund. The work performed at Temple University is partially supported by US NSF under Grant Nos. 1828363 and 1813081. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Publisher Copyright:
© 1990-2012 IEEE.
PY - 2020/7/1
Y1 - 2020/7/1
N2 - Scientific simulations on high-performance computing (HPC) systems generate vast amounts of floating-point data that need to be reduced in order to lower the storage and I/O cost. Lossy compressors trade data accuracy for reduction performance and have been demonstrated to be effective in reducing data volume. However, a key hurdle to wide adoption of lossy compressors is that the trade-off between data accuracy and compression performance, particularly the compression ratio, is not well understood. Consequently, domain scientists often need to exhaust many possible error bounds before they can figure out an appropriate setup. The current practice of using lossy compressors to reduce data volume is, therefore, through trial and error, which is not efficient for large datasets which take a tremendous amount of computational resources to compress. This paper aims to analyze and estimate the compression performance of lossy compressors on HPC datasets. In particular, we predict the compression ratios of two modern lossy compressors that achieve superior performance, SZ and ZFP, on HPC scientific datasets at various error bounds, based upon the compressors' intrinsic metrics collected under a given base error bound. We evaluate the estimation scheme using twenty real HPC datasets and the results confirm the effectiveness of our approach.
AB - Scientific simulations on high-performance computing (HPC) systems generate vast amounts of floating-point data that need to be reduced in order to lower the storage and I/O cost. Lossy compressors trade data accuracy for reduction performance and have been demonstrated to be effective in reducing data volume. However, a key hurdle to wide adoption of lossy compressors is that the trade-off between data accuracy and compression performance, particularly the compression ratio, is not well understood. Consequently, domain scientists often need to exhaust many possible error bounds before they can figure out an appropriate setup. The current practice of using lossy compressors to reduce data volume is, therefore, through trial and error, which is not efficient for large datasets which take a tremendous amount of computational resources to compress. This paper aims to analyze and estimate the compression performance of lossy compressors on HPC datasets. In particular, we predict the compression ratios of two modern lossy compressors that achieve superior performance, SZ and ZFP, on HPC scientific datasets at various error bounds, based upon the compressors' intrinsic metrics collected under a given base error bound. We evaluate the estimation scheme using twenty real HPC datasets and the results confirm the effectiveness of our approach.
KW - High-performance computing
KW - data reduction
KW - lossy compression
KW - performance modeling
UR - http://www.scopus.com/inward/record.url?scp=85081632664&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081632664&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2019.2938503
DO - 10.1109/TPDS.2019.2938503
M3 - Article
AN - SCOPUS:85081632664
SN - 1045-9219
VL - 31
SP - 1621
EP - 1635
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 7
M1 - 8821342
ER -