EDO: Improving read performance for scientific applications through elastic data organization

Yuan Tian, Scott Klasky, Hasan Abbasi, Jay Lofstead, Ray Grout, Norbert Podhorszki, Qing Liu, Yandong Wang, Weikuan Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

46 Scopus citations

Abstract

Large scale scientific applications are often bottlenecked due to the writing of checkpoint-restart data. Much work has been focused on improving their write performance. With the mounting needs of scientific discovery from these datasets, it is also important to provide good read performance for many common access patterns, which requires effective data organization. To address this issue, we introduce Elastic Data Organization (EDO), which can transparently enable different data organization strategies for scientific applications. Through its flexible data ordering algorithms, EDO harmonizes different access patterns with the underlying file system. Two levels of data ordering are introduced in EDO. One works at the level of data groups (a.k.a process groups). It uses Hilbert Space Filling Curves (SFC) to balance the distribution of data groups across storage targets. Another governs the ordering of data elements within a data group. It divides a data group into sub chunks and strikes a good balance between the size of sub chunks and the number of seek operations. Our experimental results demonstrate that EDO is able to achieve balanced data distribution across all dimensions and improve the read performance of multidimensional datasets in scientific applications.

Original languageEnglish (US)
Title of host publicationProceedings - 2011 IEEE International Conference on Cluster Computing, CLUSTER 2011
Pages93-102
Number of pages10
DOIs
StatePublished - 2011
Externally publishedYes
Event2011 IEEE International Conference on Cluster Computing, CLUSTER 2011 - Austin, TX, United States
Duration: Sep 26 2011Sep 30 2011

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (Print)1552-5244

Other

Other2011 IEEE International Conference on Cluster Computing, CLUSTER 2011
Country/TerritoryUnited States
CityAustin, TX
Period9/26/119/30/11

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Signal Processing

Keywords

  • ADIOS
  • Data Organization
  • Parallel I/O
  • Planar Read Patterns
  • Space Filling Curve

Fingerprint

Dive into the research topics of 'EDO: Improving read performance for scientific applications through elastic data organization'. Together they form a unique fingerprint.

Cite this