TY - GEN
T1 - Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures
AU - McLaughlin, Adam
AU - Merrill, Duane
AU - Garland, Michael
AU - Bader, David A.
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/3/8
Y1 - 2016/3/8
N2 - Contemporary microprocessors use relaxed memory consistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In particular, verifying an execution of a program against its system's memory consistency model is an NP-complete problem. Several graph-based approximations to this problem based on carefully constructed randomized test programs have been proposed in the literature, however, such approaches are sequential and execute slowly on large graphs of interest. Unfortunately, the ability to execute larger tests is tremendously important, since such tests enable one to expose bugs more quickly. Successfully executing more tests per unit time is also desirable, since it allows for one to check for a greater variety of errors in the memory subsystem by utilizing a more diverse set of tests. This paper improves upon existing work by introducing an algorithm that not only reduces the time complexity of the verification process, but also facilitates the development of parallel algorithms for solving these problems. We first show performance improvements from a sequential approach and gain further performance from parallel implementations in OpenMP and CUDA. For large tests of interest, our GPU implementation achieves an average application speedup of 26.36x over existing techniques in use at NVIDIA.
AB - Contemporary microprocessors use relaxed memory consistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In particular, verifying an execution of a program against its system's memory consistency model is an NP-complete problem. Several graph-based approximations to this problem based on carefully constructed randomized test programs have been proposed in the literature, however, such approaches are sequential and execute slowly on large graphs of interest. Unfortunately, the ability to execute larger tests is tremendously important, since such tests enable one to expose bugs more quickly. Successfully executing more tests per unit time is also desirable, since it allows for one to check for a greater variety of errors in the memory subsystem by utilizing a more diverse set of tests. This paper improves upon existing work by introducing an algorithm that not only reduces the time complexity of the verification process, but also facilitates the development of parallel algorithms for solving these problems. We first show performance improvements from a sequential approach and gain further performance from parallel implementations in OpenMP and CUDA. For large tests of interest, our GPU implementation achieves an average application speedup of 26.36x over existing techniques in use at NVIDIA.
KW - Graph Algorithms
KW - Memory Consistency Verification
KW - Parallel Algorithms
KW - Relaxed Memory Models
UR - http://www.scopus.com/inward/record.url?scp=84975490429&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84975490429&partnerID=8YFLogxK
U2 - 10.1109/PACT.2015.18
DO - 10.1109/PACT.2015.18
M3 - Conference contribution
AN - SCOPUS:84975490429
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 51
EP - 62
BT - Proceedings - 24th International Conference on Parallel Architecture and Compilation, PACT 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th International Conference on Parallel Architecture and Compilation, PACT 2015
Y2 - 18 October 2015 through 21 October 2015
ER -