Evaluating multicore processors and accelerators for dense numerical computations

Seunghwa Kang, Nitin Arora, Aashay Shringarpure, Richard W. Vuduc, David A. Bader

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

In this chapter, we empirically evaluate fundamental design trade-offs among current multicore processors and accelerator technologies and their impact on dense numerical computations. The main objectives of this work are to understand the differences in the implementation techniques required to achieve good performance on a variety of current multicore and accelerator platforms and to aid application designers in better mapping their software to the most suitable architecture. We also aim to influence future computing system design. We present interarchitectural comparisons of dense numerical kernels from computational statistics and direct n-body problems using a spectrum of multicore and accelerator platforms, including those based on the Intel Harpertown and Nehalem architectures, the AMD Barcelona architecture, the Sony-Toshiba-IBM Cell Broadband Engine, and the second-generation PowerXCell/8i and the NVIDIA Tesla C870 and C1060. We illustrate the software implementation process on each platform; measure and analyze the performance, coding complexity, and energy efficiency of each implementation; and discuss the impact of different architectural design choices on each implementation.

Original languageEnglish (US)
Title of host publicationMulticore Computing
Subtitle of host publicationAlgorithms, Architectures, and Applications
PublisherCRC Press
Pages241-284
Number of pages44
ISBN (Electronic)9781439854358
ISBN (Print)9781439854341
DOIs
StatePublished - Jan 1 2013
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Mathematics

Fingerprint

Dive into the research topics of 'Evaluating multicore processors and accelerators for dense numerical computations'. Together they form a unique fingerprint.

Cite this