TY - JOUR
T1 - High-Ratio Lossy Compression
T2 - Exploring the Autoencoder to Compress Scientific Data
AU - Liu, Tong
AU - Wang, Jinzhen
AU - Liu, Qing
AU - Alibhai, Shakeel
AU - Lu, Tao
AU - He, Xubin
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2023/2/1
Y1 - 2023/2/1
N2 - Scientific simulations on high-performance computing (HPC) systems can generate large amounts of floating-point data per run. To mitigate the data storage bottleneck and lower the data volume, it is common for floating-point compressors to be employed. As compared to lossless compressors, lossy compressors, such as SZ and ZFP, can reduce data volume more aggressively while maintaining the usefulness of the data. However, a reduction ratio of more than two orders of magnitude is almost impossible without seriously distorting the data. In deep learning, the autoencoder technique has shown great potential for data compression, in particular with images. Whether the autoencoder can deliver similar performance on scientific data, however, is unknown. In this article, we for the first time conduct a comprehensive study on the use of autoencoders to compress real-world scientific data and illustrate several key findings on using autoencoders for scientific data reduction. We implement an autoencoder-based compression prototype to reduce floating-point data. Our study shows that the out-of-the-box implementation needs to be further tuned in order to achieve high compression ratios and satisfactory error bounds. Our evaluation results show that, for most of the test datasets, the tuned autoencoder outperforms SZ by up to 4X, and ZFP by up to 50X in compression ratios, respectively. Our practices and lessons learned in this work can direct future optimizations for using autoencoders to compress scientific data.
AB - Scientific simulations on high-performance computing (HPC) systems can generate large amounts of floating-point data per run. To mitigate the data storage bottleneck and lower the data volume, it is common for floating-point compressors to be employed. As compared to lossless compressors, lossy compressors, such as SZ and ZFP, can reduce data volume more aggressively while maintaining the usefulness of the data. However, a reduction ratio of more than two orders of magnitude is almost impossible without seriously distorting the data. In deep learning, the autoencoder technique has shown great potential for data compression, in particular with images. Whether the autoencoder can deliver similar performance on scientific data, however, is unknown. In this article, we for the first time conduct a comprehensive study on the use of autoencoders to compress real-world scientific data and illustrate several key findings on using autoencoders for scientific data reduction. We implement an autoencoder-based compression prototype to reduce floating-point data. Our study shows that the out-of-the-box implementation needs to be further tuned in order to achieve high compression ratios and satisfactory error bounds. Our evaluation results show that, for most of the test datasets, the tuned autoencoder outperforms SZ by up to 4X, and ZFP by up to 50X in compression ratios, respectively. Our practices and lessons learned in this work can direct future optimizations for using autoencoders to compress scientific data.
KW - Lossy data compression
KW - autoencoder
KW - machine learning
KW - scientific data
UR - http://www.scopus.com/inward/record.url?scp=85103163762&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103163762&partnerID=8YFLogxK
U2 - 10.1109/TBDATA.2021.3066151
DO - 10.1109/TBDATA.2021.3066151
M3 - Article
AN - SCOPUS:85103163762
SN - 2332-7790
VL - 9
SP - 22
EP - 36
JO - IEEE Transactions on Big Data
JF - IEEE Transactions on Big Data
IS - 1
ER -