PARLO: PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns

Zhenhuan Gong, David A. Boyuka, Xiaocheng Zou, Qing Liu, Norbert Podhorszki, Scott Klasky, Xiaosong Ma, Nagiza F. Samatova

Research output: Contribution to conferencePaperpeer-review

16 Scopus citations

Abstract

The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-level data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.

Original languageEnglish (US)
Pages343-351
Number of pages9
DOIs
StatePublished - 2013
Externally publishedYes
Event13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013 - Delft, Netherlands
Duration: May 13 2013May 16 2013

Other

Other13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013
Country/TerritoryNetherlands
CityDelft
Period5/13/135/16/13

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint

Dive into the research topics of 'PARLO: PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns'. Together they form a unique fingerprint.

Cite this