Abstract
The multiplication of large spare matrices is a basic operation in many scientific and engineering applications. There exist some high-performance library routines for this operation. They are often optimized based on the target architecture. For a parallel environment, it is essential to partition the entire operation into well balanced tasks and assign them to individual processing elements. Most of the existing techniques partition the given matrices based on some kind of workload estimation. For irregular sparse matrices on PC clusters, however, the workloads may not be well estimated in advance. Any approach other than run-time dynamic partitioning may degrade performance. In this paper, we apply our super-programming approach [24] to parallel large matrix multiplication on PC clusters. In our approach, tasks are partitioned into super-instructions that are dynamically assigned to member computer nodes. Thus, the load balancing logic is separated from the computing logic; the former is taken over by the runtime environment. Our super-programming approach facilitates ease of program development and targets high efficiency in dynamic load balancing. Workloads can be balanced effectively and the optimization overhead is small. The results prove the viability of our approach.
Original language | English (US) |
---|---|
Pages (from-to) | 1774-1780 |
Number of pages | 7 |
Journal | IEICE Transactions on Information and Systems |
Volume | E87-D |
Issue number | 7 |
State | Published - Jul 2004 |
All Science Journal Classification (ASJC) codes
- Software
- Hardware and Architecture
- Computer Vision and Pattern Recognition
- Electrical and Electronic Engineering
- Artificial Intelligence
Keywords
- Load balancing
- Matrix multiplication
- PC cluster
- Performance evaluation
- Programming model