Error-controlled, progressive, and adaptable retrieval of scientific data with multilevel decomposition

Xin Liang, Qian Gong, Jieyang Chen, Ben Whitney, Lipeng Wan, Qing Liu, David Pugmire, Rick Archibald, Norbert Podhorszki, Scott Klasky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Extreme-scale simulations and high-resolution instruments have been generating an increasing amount of data, which poses significant challenges to not only data storage during the run, but also post-processing where data will be repeatedly retrieved and analyzed for a long period of time the challenges in satisfying a wide range of post-hoc analysis needs while minimizing the I/O overhead caused by inappropriate and/or excessive data retrieval should never be left unmanaged. In this paper, we propose a data refactoring, compressing, and retrieval framework capable of 1) fine-grained data refactoring with regard to precision; 2) incrementally retrieving and recomposing the data in terms of various error bounds; and 3) adaptively retrieving data in multi-precision and multi-resolution with respect to different analysis. With the progressive data re-composition and the adaptable retrieval algorithms, our framework significantly reduces the amount of data retrieved when multiple incremental precision are requested and/or the downstream analysis time when coarse resolution is used. Experiments show that the amount of data retrieved under the same progressively requested error bound using our framework is 64% less than that using state-of-The-Art single-error-bounded approaches. Parallel experiments with up to 1, 024 cores and 600 GB data in total show that our approach yields 1.36× and 2.52× performance over existing approaches in writing to and reading from persistent storage systems, respectively.

Original languageEnglish (US)
Title of host publicationProceedings of SC 2021
Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond
PublisherIEEE Computer Society
ISBN (Electronic)9781450384421
DOIs
StatePublished - Nov 14 2021
Event33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021 - Virtual, Online, United States
Duration: Nov 14 2021Nov 19 2021

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021
Country/TerritoryUnited States
CityVirtual, Online
Period11/14/2111/19/21

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Keywords

  • Data compression
  • data retrieval
  • error control
  • storage and I/O

Fingerprint

Dive into the research topics of 'Error-controlled, progressive, and adaptable retrieval of scientific data with multilevel decomposition'. Together they form a unique fingerprint.

Cite this