TY - GEN
T1 - Region-adaptive, Error-controlled Scientific Data Compression using Multilevel Decomposition
AU - Gong, Qian
AU - Whitney, Ben
AU - Zhang, Chengzhu
AU - Liang, Xin
AU - Rangarajan, Anand
AU - Chen, Jieyang
AU - Wan, Lipeng
AU - Ullrich, Paul
AU - Liu, Qing
AU - Jacob, Robert
AU - Ranka, Sanjay
AU - Klasky, Scott
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/7/6
Y1 - 2022/7/6
N2 - The increase of computer processing speed is significantly outpacing improvements in network and storage bandwidth, leading to the big data challenge in modern science, where scientific applications can quickly generate much more data than that can be transferred and stored. As a result, big scientific data must be reduced by a few orders of magnitude while the accuracy of the reduced data needs to be guaranteed for further scientific explorations. Moreover, scientists are often interested in some specific spatial/temporal regions in their data, where higher accuracy is required. The locations of the regions requiring high accuracy can sometimes be prescribed based on application knowledge, while other times they must be estimated based on general spatial/temporal variation. In this paper, we develop a novel multilevel approach which allows users to impose region-wise compression error bounds. Our method utilizes the byproduct of a multilevel compressor to detect regions where details are rich and we provide the theoretical underpinning for region-wise error control. With spatially varying precision preservation, our approach can achieve significantly higher compression ratios than single-error bounded compression approaches and control errors in the regions of interest. We conduct the evaluations on two climate use cases-one targeting small-scale, node features and the other focusing on long, areal features. For both use cases, the locations of the features were unknown ahead of the compression. By selecting approximately 16% of the data based on multi-scale spatial variations and compressing those regions with smaller error tolerances than the rest, our approach improves the accuracy of post-analysis by approximately 2 × compared to single-error-bounded compression at the same compression ratio. Using the same error bound for the region of interest, our approach can achieve an increase of more than 50% in overall compression ratio.
AB - The increase of computer processing speed is significantly outpacing improvements in network and storage bandwidth, leading to the big data challenge in modern science, where scientific applications can quickly generate much more data than that can be transferred and stored. As a result, big scientific data must be reduced by a few orders of magnitude while the accuracy of the reduced data needs to be guaranteed for further scientific explorations. Moreover, scientists are often interested in some specific spatial/temporal regions in their data, where higher accuracy is required. The locations of the regions requiring high accuracy can sometimes be prescribed based on application knowledge, while other times they must be estimated based on general spatial/temporal variation. In this paper, we develop a novel multilevel approach which allows users to impose region-wise compression error bounds. Our method utilizes the byproduct of a multilevel compressor to detect regions where details are rich and we provide the theoretical underpinning for region-wise error control. With spatially varying precision preservation, our approach can achieve significantly higher compression ratios than single-error bounded compression approaches and control errors in the regions of interest. We conduct the evaluations on two climate use cases-one targeting small-scale, node features and the other focusing on long, areal features. For both use cases, the locations of the features were unknown ahead of the compression. By selecting approximately 16% of the data based on multi-scale spatial variations and compressing those regions with smaller error tolerances than the rest, our approach improves the accuracy of post-analysis by approximately 2 × compared to single-error-bounded compression at the same compression ratio. Using the same error bound for the region of interest, our approach can achieve an increase of more than 50% in overall compression ratio.
KW - Climate Data Compression
KW - Error Control
KW - Region-adaptive Lossy Compression
UR - http://www.scopus.com/inward/record.url?scp=85137719226&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137719226&partnerID=8YFLogxK
U2 - 10.1145/3538712.3538717
DO - 10.1145/3538712.3538717
M3 - Conference contribution
AN - SCOPUS:85137719226
T3 - ACM International Conference Proceeding Series
BT - Scientific and Statistical Database Management - 34th International Conference, SSDBM 2022 - Proceedings
A2 - Pourabbas, Elaheh
A2 - Zhou, Yongluan
A2 - Li, Yuchen
A2 - Yang, Bin
PB - Association for Computing Machinery
T2 - 34th International Conference on Scientific and Statistical Database Management, SSDBM 2022
Y2 - 6 July 2022 through 8 July 2022
ER -