TY - GEN
T1 - Wedge-Parallel Triangle Counting for GPUs
AU - Spaan, Jeffrey
AU - Chen, Kuan Hsun
AU - Bader, David A.
AU - Varbanescu, Ana Lucia
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - For fast processing of increasingly large graphs, triangle counting – a common building block of graph processing algorithms, is often performed on GPUs. However, applying massive parallelism to triangle counting is challenging due to the algorithm’s inherent irregular access patterns and workload imbalance. In this work, we propose WeTriC, a novel wedge-parallel triangle counting algorithm for GPUs, which, using fine(r)-grained parallelism through a lightweight static mapping of wedges to threads, improves load balancing and efficiency. Our theoretical analysis compares different parallelization granularities, while optimizations enhance caching, reduce work-per-intersection, and minimize overhead. Performance experiments indicate that WeTriC yields 5.63× and 4.69× speedup over optimized vertex-parallel and edge-parallel binary search triangle counting algorithms, respectively. Furthermore, we show that WeTriC consistently outperforms the state-of-the-art (i.e., on avg. 2.86× faster than Trust and 2.32× faster than GroupTC).
AB - For fast processing of increasingly large graphs, triangle counting – a common building block of graph processing algorithms, is often performed on GPUs. However, applying massive parallelism to triangle counting is challenging due to the algorithm’s inherent irregular access patterns and workload imbalance. In this work, we propose WeTriC, a novel wedge-parallel triangle counting algorithm for GPUs, which, using fine(r)-grained parallelism through a lightweight static mapping of wedges to threads, improves load balancing and efficiency. Our theoretical analysis compares different parallelization granularities, while optimizations enhance caching, reduce work-per-intersection, and minimize overhead. Performance experiments indicate that WeTriC yields 5.63× and 4.69× speedup over optimized vertex-parallel and edge-parallel binary search triangle counting algorithms, respectively. Furthermore, we show that WeTriC consistently outperforms the state-of-the-art (i.e., on avg. 2.86× faster than Trust and 2.32× faster than GroupTC).
KW - Graph Processing
KW - Parallel Computing on GPUs
KW - Triangle Counting
KW - Wedge-Parallel Approaches
UR - https://www.scopus.com/pages/publications/105014493881
UR - https://www.scopus.com/pages/publications/105014493881#tab=citedBy
U2 - 10.1007/978-3-031-99872-0_1
DO - 10.1007/978-3-031-99872-0_1
M3 - Conference contribution
AN - SCOPUS:105014493881
SN - 9783031998713
T3 - Lecture Notes in Computer Science
SP - 3
EP - 17
BT - Euro-Par 2025
A2 - Nagel, Wolfgang E.
A2 - Goehringer, Diana
A2 - Diniz, Pedro C.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 31st International Conference on Parallel and Distributed Computing, Euro-Par 2025
Y2 - 25 April 2025 through 29 April 2025
ER -