TY - GEN
T1 - A fast, energy-efficient abstraction for simultaneous breadth-first searches
AU - McLaughlin, Adam
AU - Riedy, Jason
AU - Bader, David A.
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/11/9
Y1 - 2015/11/9
N2 - Optimized GPU kernels are sufficiently complicated to write that they often are specialized to input data, target architectures, or applications. This paper presents a multi-search abstraction for computing multiple breadth-first searches in parallel and demonstrates a high-performance, general implementation. Our abstraction removes the burden of orchestrating graph traversal from the user while providing high performance and low energy usage, an often overlooked component of algorithm design. Energy consumption has become a first-class hardware design constraint for both massive and embedded computing platforms. Our abstraction can be applied to such problems as the all-pairs shortest-path problem, community detection, reachability querying, and others. To map graph traversal efficiently to the GPU, our hybrid implementation chooses between processing active vertices with a single thread or an entire warp based on vertex outdegree. For a set of twelve varied graphs, the implementation of our abstraction saves 42% time and 62% energy on average compared to representative implementations of specific applications from existing literature.
AB - Optimized GPU kernels are sufficiently complicated to write that they often are specialized to input data, target architectures, or applications. This paper presents a multi-search abstraction for computing multiple breadth-first searches in parallel and demonstrates a high-performance, general implementation. Our abstraction removes the burden of orchestrating graph traversal from the user while providing high performance and low energy usage, an often overlooked component of algorithm design. Energy consumption has become a first-class hardware design constraint for both massive and embedded computing platforms. Our abstraction can be applied to such problems as the all-pairs shortest-path problem, community detection, reachability querying, and others. To map graph traversal efficiently to the GPU, our hybrid implementation chooses between processing active vertices with a single thread or an entire warp based on vertex outdegree. For a set of twelve varied graphs, the implementation of our abstraction saves 42% time and 62% energy on average compared to representative implementations of specific applications from existing literature.
UR - http://www.scopus.com/inward/record.url?scp=84963745426&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963745426&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2015.7322466
DO - 10.1109/HPEC.2015.7322466
M3 - Conference contribution
AN - SCOPUS:84963745426
T3 - 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015
BT - 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE High Performance Extreme Computing Conference, HPEC 2015
Y2 - 15 September 2015 through 17 September 2015
ER -