Improving Progressive Retrieval for HPC Scientific Data using Deep Neural Network

Jinzhen Wang, Xin Liang, Ben Whitney, Jieyang Chen, Qian Gong, Xubin He, Lipeng Wan, Scott Klasky, Norbert Podhorszki, Qing Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As the disparity between compute and I/O on high-performance computing systems has continued to widen, it has become increasingly difficult to perform post-hoc data analytics on full-resolution scientific simulation data due to the high I/O cost. Error-bounded data decomposition and progressive data retrieval framework has recently been developed to address such a challenge by performing data decomposition before storage and reading only part of the decomposed data when necessary. However, the performance of the progressive retrieval framework has been suffering from the over-pessimistic error control theory, such that the achieved maximum error of recomposed data is significantly lower than the required error. Therefore, more data than required is fetched for recomposition, incurring additional I/O overhead. In order to tackle this issue, we propose a DNN-based progressive retrieval framework that can better identify the minimum amount of data to be retrieved. Our contributions are as follows: 1) We provide an in-depth investigation of the recently developed progressive retrieval framework; 2) We propose two designs of prediction models (named D-MGARD and E-MGARD) to estimate the amount of retrieved data size based on error bounds. 3) We evaluate our proposed solutions using scientific datasets generated by real-world simulations from two domains. Evaluation results demonstrate the effectiveness of our solution in accurately predicting the amount of retrieval data size, as well as the advantages of our solution over the traditional approach to reducing the I/O overhead. Based on our evaluation, our solution is shown to read significantly less data (5% - 40% with D-MGARD, 20% - 80% with E-MGARD).

Original languageEnglish (US)
Title of host publicationProceedings - 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023
PublisherIEEE Computer Society
Pages2727-2739
Number of pages13
ISBN (Electronic)9798350322279
DOIs
StatePublished - 2023
Event39th IEEE International Conference on Data Engineering, ICDE 2023 - Anaheim, United States
Duration: Apr 3 2023Apr 7 2023

Publication series

NameProceedings - International Conference on Data Engineering
Volume2023-April
ISSN (Print)1084-4627

Conference

Conference39th IEEE International Conference on Data Engineering, ICDE 2023
Country/TerritoryUnited States
CityAnaheim
Period4/3/234/7/23

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Keywords

  • deep learning
  • High-performance computing
  • lossy compression
  • scientific data management

Fingerprint

Dive into the research topics of 'Improving Progressive Retrieval for HPC Scientific Data using Deep Neural Network'. Together they form a unique fingerprint.

Cite this