TY - JOUR
T1 - A Data-driven Approach to Harvesting Latent Reduced Models to Precondition Lossy Compression for Scientific Data
AU - Luo, Huizhang
AU - Wang, Junqi
AU - Qin, Zhenlu
AU - Huang, Dan
AU - Liu, Qing
AU - Zhou, Mengchu
AU - Jiang, Hong
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2023/6/1
Y1 - 2023/6/1
N2 - In this paper, we propose and evaluate the idea that data need to be preconditioned prior to compression, such that they can better match the design philosophies of lossy compressors for HPC scientific data. In particular, we aim to identify a reduced model that can be utilized to transform the original data into a more compressible form. We begin with two PDE applications as a proof of concept, in which we demonstrate that a reduced model can indeed reside in the full model output, and can be utilized to improve compression ratios. A mathematical proof is also presented to show how the compression ratio is improved by the reduced model. We further explore more general dimension reduction techniques to extract the reduced model, including principal component analysis, singular value decomposition, and discrete wavelet transform. After preconditioning, the reduced model in conjunction with difference between the reduced model and full model is stored, which results in higher compression ratios. We evaluate the reduced models on ten scientific datasets, and the results show the effectiveness of our approaches. Given that there is no single method that consistently achieves the best performance, we further propose a selection strategy that guides users to select the best reduced model prior to data reduction.
AB - In this paper, we propose and evaluate the idea that data need to be preconditioned prior to compression, such that they can better match the design philosophies of lossy compressors for HPC scientific data. In particular, we aim to identify a reduced model that can be utilized to transform the original data into a more compressible form. We begin with two PDE applications as a proof of concept, in which we demonstrate that a reduced model can indeed reside in the full model output, and can be utilized to improve compression ratios. A mathematical proof is also presented to show how the compression ratio is improved by the reduced model. We further explore more general dimension reduction techniques to extract the reduced model, including principal component analysis, singular value decomposition, and discrete wavelet transform. After preconditioning, the reduced model in conjunction with difference between the reduced model and full model is stored, which results in higher compression ratios. We evaluate the reduced models on ten scientific datasets, and the results show the effectiveness of our approaches. Given that there is no single method that consistently achieves the best performance, we further propose a selection strategy that guides users to select the best reduced model prior to data reduction.
KW - Data reduction
KW - compressor selection
KW - data preconditioning
KW - high-performance computing
UR - http://www.scopus.com/inward/record.url?scp=85144048391&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85144048391&partnerID=8YFLogxK
U2 - 10.1109/TBDATA.2022.3225959
DO - 10.1109/TBDATA.2022.3225959
M3 - Article
AN - SCOPUS:85144048391
SN - 2332-7790
VL - 9
SP - 949
EP - 963
JO - IEEE Transactions on Big Data
JF - IEEE Transactions on Big Data
IS - 3
ER -