TY - JOUR
T1 - SIRIUS
T2 - Enabling Progressive Data Exploration for Extreme-Scale Scientific Data
AU - Qiao, Zhenbo
AU - Lu, Tao
AU - Luo, Huizhang
AU - Liu, Qing
AU - Klasky, Scott
AU - Podhorszki, Norbert
AU - Wang, Jinzhen
N1 - Funding Information:
The authors wish to acknowledge the support from the US Department of Energy Advanced Scientific Computing Research, Oak Ridge Leadership Computing Facility, and the US National Science Foundation CCF-1718297 and CCF 1812861.
Funding Information:
US National Science Foundation CCF-1718297 and CCF 1812861.
Publisher Copyright:
© 2015 IEEE.
PY - 2018/10/1
Y1 - 2018/10/1
N2 - Scientific simulations on high performance computing (HPC) platforms generate large quantities of data. To bridge the widening gap between compute and I/O, and enable data to be more efficiently stored and analyzed, simulation outputs need to be refactored, reduced, and appropriately mapped to storage tiers. However, a systematic solution to support these steps has been lacking in the current HPC software ecosystem. To that end, this paper develops SIRIUS, a progressive JPEG-like data management scheme for storing and analyzing big scientific data. It co-designs data decimation, compression, and data storage, taking the hardware characteristics of each storage tier into considerations. With reasonably low overhead, our approach refactors simulation data, using either topological or uniform decimation, into a much smaller, reduced-Accuracy base dataset, and a series of deltas that is used to augment the accuracy if needed. The base dataset and deltas are compressed and written to multiple storage tiers. Data saved on different tiers can then be selectively retrieved to restore the level of accuracy that satisfies data analytics. Thus, SIRIUS provides a paradigm shift towards elastic data analytics and enables end users to make trade-offs between analysis speed and accuracy on-The-fly. This paper further develops algorithms to preserve statistics for data decimation, a common requirement for reducing data. We assess the impact of SIRIUS on unstructured triangular meshes, a pervasive data model used in scientific simulations. In particular, we evaluate two realistic use cases: The blob detection in fusion and high-pressure area extraction in computational fluid dynamics.
AB - Scientific simulations on high performance computing (HPC) platforms generate large quantities of data. To bridge the widening gap between compute and I/O, and enable data to be more efficiently stored and analyzed, simulation outputs need to be refactored, reduced, and appropriately mapped to storage tiers. However, a systematic solution to support these steps has been lacking in the current HPC software ecosystem. To that end, this paper develops SIRIUS, a progressive JPEG-like data management scheme for storing and analyzing big scientific data. It co-designs data decimation, compression, and data storage, taking the hardware characteristics of each storage tier into considerations. With reasonably low overhead, our approach refactors simulation data, using either topological or uniform decimation, into a much smaller, reduced-Accuracy base dataset, and a series of deltas that is used to augment the accuracy if needed. The base dataset and deltas are compressed and written to multiple storage tiers. Data saved on different tiers can then be selectively retrieved to restore the level of accuracy that satisfies data analytics. Thus, SIRIUS provides a paradigm shift towards elastic data analytics and enables end users to make trade-offs between analysis speed and accuracy on-The-fly. This paper further develops algorithms to preserve statistics for data decimation, a common requirement for reducing data. We assess the impact of SIRIUS on unstructured triangular meshes, a pervasive data model used in scientific simulations. In particular, we evaluate two realistic use cases: The blob detection in fusion and high-pressure area extraction in computational fluid dynamics.
KW - High-performance computing
KW - compression
KW - data analytics
KW - data reduction
KW - progressive refactoring
KW - storage
UR - http://www.scopus.com/inward/record.url?scp=85058901442&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85058901442&partnerID=8YFLogxK
U2 - 10.1109/TMSCS.2018.2886851
DO - 10.1109/TMSCS.2018.2886851
M3 - Article
AN - SCOPUS:85058901442
SN - 2332-7766
VL - 4
SP - 900
EP - 913
JO - IEEE Transactions on Multi-Scale Computing Systems
JF - IEEE Transactions on Multi-Scale Computing Systems
IS - 4
M1 - 8576666
ER -