TY - GEN
T1 - Understanding the design trade-offs among current multicore systems for numerical computations
AU - Kang, Seunghwa
AU - Bader, David A.
AU - Vuduc, Richard
PY - 2009
Y1 - 2009
N2 - In this paper, we empirically evaluate fundamental design trade-offs among the most recent multicore processors and accelerator technologies. Our primary aim is to aid application designers in better mapping their software to the most suitable architecture, with an additional goal of influencing future computing system design. We specifically examine five architectures, based on: the Intel quadcore Harpertown processor, the AMD quad-core Barcelona processor, the Sony-Toshiba-IBM Cell Broadband Engine processors (both the first-generation chip and the secondgeneration PowerXCell 8i), and the NVIDIA Tesla C1060 GPU. We illustrate the software implementation process on each platform for a set of widely-used kernels from computational statistics that are simple to reason about; measure and analyze the performance of each implementation; and discuss the impact of different architectural design choices on each implementation.
AB - In this paper, we empirically evaluate fundamental design trade-offs among the most recent multicore processors and accelerator technologies. Our primary aim is to aid application designers in better mapping their software to the most suitable architecture, with an additional goal of influencing future computing system design. We specifically examine five architectures, based on: the Intel quadcore Harpertown processor, the AMD quad-core Barcelona processor, the Sony-Toshiba-IBM Cell Broadband Engine processors (both the first-generation chip and the secondgeneration PowerXCell 8i), and the NVIDIA Tesla C1060 GPU. We illustrate the software implementation process on each platform for a set of widely-used kernels from computational statistics that are simple to reason about; measure and analyze the performance of each implementation; and discuss the impact of different architectural design choices on each implementation.
UR - http://www.scopus.com/inward/record.url?scp=70449975572&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70449975572&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2009.5161055
DO - 10.1109/IPDPS.2009.5161055
M3 - Conference contribution
AN - SCOPUS:70449975572
SN - 9781424437504
T3 - IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium
BT - IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium
T2 - 23rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2009
Y2 - 23 May 2009 through 29 May 2009
ER -