Matching data distribution to workload distribution is important in improving the performance of distributed-memory multiprocessors. While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various reasons including complexity of address computation, runtime data movement, and irregular resource usage. This report presents our study on multithreading for distributed-memory multiprocessors. Specifically, we investigate the effects of multithreading ondatadistribution andworkloaddistribution withvariable, thread granularity. Various types of workload distribution strategies are defined along with thread granularity. Several types of data distribution strategies are investigated. These include row-wise cyclic,k-way partial-row cyclic, and blocked distribution. To investigate the performance of multithreading, two problems are selected: highly sequential Gaussian elimination with partial pivoting and highly parallel matrix multiplication. Execution results on the 80-processor EM-4 distributed-memory multiprocessor indicate that multithreading can off set the loss due to the mismatch between data distribution and workload distribution even for sequential and irregular problems while giving high absolute performance.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Hardware and Architecture
- Computer Networks and Communications
- Artificial Intelligence