TY - GEN
T1 - Quality-aware data management for large scale scientific applications
AU - Zou, Hongbo
AU - Zheng, Fang
AU - Wolf, Matthew
AU - Eisenhauer, Greg
AU - Schwan, Karsten
AU - Abbasi, Hasan
AU - Liu, Qing
AU - Podhorszki, Norbert
AU - Klasky, Scott
PY - 2012
Y1 - 2012
N2 - Increasingly larger scale simulations are generating an unprecedented amount of output data, causing researchers to explore new 'data staging' methods that buffer, use, and/or reduce such data online rather than simply pushing it to disk. Leveraging the capabilities of data staging, this study explores the potential for data reduction via online data compression, first using general compression techniques and then proposing use- specific methods that permit users to define simple data queries that cause only the data identified by those queries to be emitted. Using online methods for code generation and deployment, with such dynamic data queries, end users can precisely identify the quality of information (QoI) of their output data, by explicitly determining what data may be lost vs. retained, in contrast to general-purpose lossy compression methods that do not provide such levels of control. The paper also describes the key elements of a quality-aware data management system (QADMS) for high- end machines enabled by this approach. Initial experimental results demonstrate that QADMS can effectively reduce data movement cost and improve the QoS while meeting the QoI constraint stated by users.
AB - Increasingly larger scale simulations are generating an unprecedented amount of output data, causing researchers to explore new 'data staging' methods that buffer, use, and/or reduce such data online rather than simply pushing it to disk. Leveraging the capabilities of data staging, this study explores the potential for data reduction via online data compression, first using general compression techniques and then proposing use- specific methods that permit users to define simple data queries that cause only the data identified by those queries to be emitted. Using online methods for code generation and deployment, with such dynamic data queries, end users can precisely identify the quality of information (QoI) of their output data, by explicitly determining what data may be lost vs. retained, in contrast to general-purpose lossy compression methods that do not provide such levels of control. The paper also describes the key elements of a quality-aware data management system (QADMS) for high- end machines enabled by this approach. Initial experimental results demonstrate that QADMS can effectively reduce data movement cost and improve the QoS while meeting the QoI constraint stated by users.
KW - Data management
KW - HPC simulation
KW - compression
KW - quality of information
KW - visualization
UR - http://www.scopus.com/inward/record.url?scp=84876552853&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876552853&partnerID=8YFLogxK
U2 - 10.1109/SC.Companion.2012.114
DO - 10.1109/SC.Companion.2012.114
M3 - Conference contribution
AN - SCOPUS:84876552853
SN - 9780769549569
T3 - Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012
SP - 816
EP - 820
BT - Proceedings - 2012 SC Companion
T2 - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012
Y2 - 10 November 2012 through 16 November 2012
ER -