TY - GEN
T1 - Characterization of Input/Output bandwidth performance models in NUMA architecture for data intensive applications
AU - Li, Tan
AU - Ren, Yufei
AU - Yu, Dantong
AU - Jin, Shudong
AU - Robertazzi, Thomas
PY - 2013
Y1 - 2013
N2 - Data-intensive applications frequently rely on multicore computer systems, in which Non-Uniform Memory Access (NUMA) is a dominant architecture. To transfer data into and out from these high-performance computers becomes a bottleneck, and thus it is crucial to understand their I/O performance characteristics. However, the complexity in NUMA architecture presents a new challenge in modeling its I/O access cost, and thus lead to difficulties in configuring proper processor and memory affinity. In this paper, we show that existing NUMA experimental methods and metrics are inappropriate on contemporary highend systems. We characterize a state-of-the-art NUMA host, and propose, to the best of our knowledge, the first methodology to simulate I/O operations using memory semantics, and model the I/O bandwidth performance. Our methodology is thoroughly tested and validated by mapping multiple parallel I/O streams to different sets of hardware components (CPU, memory, network cards, and SSDs) and by measuring the performance of each mapping. The experimental results and analysis reveal that our methodology can dramatically reduce characterization workload, accurately estimate the overall I/O performance, and effectively mitigate resource contention among I/O tasks.
AB - Data-intensive applications frequently rely on multicore computer systems, in which Non-Uniform Memory Access (NUMA) is a dominant architecture. To transfer data into and out from these high-performance computers becomes a bottleneck, and thus it is crucial to understand their I/O performance characteristics. However, the complexity in NUMA architecture presents a new challenge in modeling its I/O access cost, and thus lead to difficulties in configuring proper processor and memory affinity. In this paper, we show that existing NUMA experimental methods and metrics are inappropriate on contemporary highend systems. We characterize a state-of-the-art NUMA host, and propose, to the best of our knowledge, the first methodology to simulate I/O operations using memory semantics, and model the I/O bandwidth performance. Our methodology is thoroughly tested and validated by mapping multiple parallel I/O streams to different sets of hardware components (CPU, memory, network cards, and SSDs) and by measuring the performance of each mapping. The experimental results and analysis reveal that our methodology can dramatically reduce characterization workload, accurately estimate the overall I/O performance, and effectively mitigate resource contention among I/O tasks.
KW - Data transfer
KW - Input/output(I/O)
KW - NUMA effects
KW - Performance model
UR - http://www.scopus.com/inward/record.url?scp=84893334019&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893334019&partnerID=8YFLogxK
U2 - 10.1109/ICPP.2013.46
DO - 10.1109/ICPP.2013.46
M3 - Conference contribution
AN - SCOPUS:84893334019
SN - 9780769551173
T3 - Proceedings of the International Conference on Parallel Processing
SP - 369
EP - 378
BT - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 42nd Annual International Conference on Parallel Processing, ICPP 2013
Y2 - 1 October 2013 through 4 October 2013
ER -