While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various practical issues. This report presents our study on multithreading for distributed-memory multiprocessors. Specifically, we investigate the effects of multithreading on data distribution and workload distribution with variable thread granularity. Various types of workload distribution strategies are defined along thread granularity. Three types of data distribution strategies are investigated, including row-wise cyclic, k-way partial-row cyclic, and blocked distribution. We have implemented all of these on the 80-processor EM-4 distributed-memory multiprocessor using highly sequential Gaussian Elimination with Partial Pivoting and highly parallel Matrix Multiplication. Experimental results indicated that multithreading can offset the loss that is due to the mismatch of data distribution to workload distribution for even sequential and irregular problems while giving high absolute performance.
|Original language||English (US)|
|Number of pages||7|
|Journal||IEEE Symposium on Parallel and Distributed Processing - Proceedings|
|State||Published - Jan 1 1996|
All Science Journal Classification (ASJC) codes