TY - JOUR
T1 - Data and workload distribution in a multithreaded architecture
AU - Sohn, Andrew
AU - Sato, Mitsuhisa
AU - Yoo, Namhoon
AU - Gaudiot, Jean Luc
N1 - Funding Information:
The authors thank the EM-4 and EM-X group members, Mitsuhisa Sato, Yuetsu Kodama, Hirofumi Sakane, Hayato Yamana, Shuichi Sakai, and Yoshinori Yamaguchi, of the Electrotechnical Laboratory for various discussion on multithreading. Andrew Sohn is supported in part by the NASA JOVE NAG8 1114-2 and the Foreign Researcher Program of the Ministry of International Trade and Industry, Japan.
PY - 1997/2/1
Y1 - 1997/2/1
N2 - Matching data distribution to workload distribution is important in improving the performance of distributed-memory multiprocessors. While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various reasons including complexity of address computation, runtime data movement, and irregular resource usage. This report presents our study on multithreading for distributed-memory multiprocessors. Specifically, we investigate the effects of multithreading ondatadistribution andworkloaddistribution withvariable, thread granularity. Various types of workload distribution strategies are defined along with thread granularity. Several types of data distribution strategies are investigated. These include row-wise cyclic,k-way partial-row cyclic, and blocked distribution. To investigate the performance of multithreading, two problems are selected: highly sequential Gaussian elimination with partial pivoting and highly parallel matrix multiplication. Execution results on the 80-processor EM-4 distributed-memory multiprocessor indicate that multithreading can off set the loss due to the mismatch between data distribution and workload distribution even for sequential and irregular problems while giving high absolute performance.
AB - Matching data distribution to workload distribution is important in improving the performance of distributed-memory multiprocessors. While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various reasons including complexity of address computation, runtime data movement, and irregular resource usage. This report presents our study on multithreading for distributed-memory multiprocessors. Specifically, we investigate the effects of multithreading ondatadistribution andworkloaddistribution withvariable, thread granularity. Various types of workload distribution strategies are defined along with thread granularity. Several types of data distribution strategies are investigated. These include row-wise cyclic,k-way partial-row cyclic, and blocked distribution. To investigate the performance of multithreading, two problems are selected: highly sequential Gaussian elimination with partial pivoting and highly parallel matrix multiplication. Execution results on the 80-processor EM-4 distributed-memory multiprocessor indicate that multithreading can off set the loss due to the mismatch between data distribution and workload distribution even for sequential and irregular problems while giving high absolute performance.
UR - http://www.scopus.com/inward/record.url?scp=0031068502&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031068502&partnerID=8YFLogxK
U2 - 10.1006/jpdc.1996.1262
DO - 10.1006/jpdc.1996.1262
M3 - Article
AN - SCOPUS:0031068502
SN - 0743-7315
VL - 40
SP - 256
EP - 264
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
IS - 2
ER -