TY - JOUR
T1 - Architecture independent parallel algorithm design
T2 - Theory vs practice
AU - Gerbessiotis, Alexandros V.
N1 - Funding Information:
The research described in this work was performed while the author was with the Computing Laboratory, Programming Research Group, The University of Oxford, Oxford, UK and supported in part by the Engineering and Physical Sciences Research Council (EPSRC) under grant GR/K16999. Subsequent work was supported in part by New Jersey Institute of Technology, under Grant No. 421350. It is a pleasure to thank Dr. Jonathan M.D. Hill for his assistance in using BSPlib . The support of the Edinburgh Parallel Computing Centre in granting the author access to a Cray T3D machine is gratefully acknowledged.
PY - 2002/4
Y1 - 2002/4
N2 - We propose architecture independent parallel algorithm design as a framework for writing parallel code that is scalable, portable and reusable. Towards this end we study the performance of some dense matrix computations such as matrix multiplication, LU decomposition and matrix inversion. Although optimized algorithms for these problems have been extensively examined before, a systematic study of an architecture independent design and analysis of parallel algorithms and their performance (including matrix computations) has not been undertaken. Even though more refined algorithms and implementations (sequential or parallel) for the stated problems exist, the complexity and performance of the introduced algorithms is sufficient to raise the issues that are important in architecture independent parallel algorithm design. Two established distributions of an input matrix among the processors of a parallel machine are examined and the particular theoretical and practical merits of each one are also discussed. The algorithms we propose have been implemented and tested on a variety of parallel systems that include the SGI Power Challenge, the IBM SP2 and the Cray T3D. Our experimental results support our claims of efficiency, portability and reusability of the presented algorithms.
AB - We propose architecture independent parallel algorithm design as a framework for writing parallel code that is scalable, portable and reusable. Towards this end we study the performance of some dense matrix computations such as matrix multiplication, LU decomposition and matrix inversion. Although optimized algorithms for these problems have been extensively examined before, a systematic study of an architecture independent design and analysis of parallel algorithms and their performance (including matrix computations) has not been undertaken. Even though more refined algorithms and implementations (sequential or parallel) for the stated problems exist, the complexity and performance of the introduced algorithms is sufficient to raise the issues that are important in architecture independent parallel algorithm design. Two established distributions of an input matrix among the processors of a parallel machine are examined and the particular theoretical and practical merits of each one are also discussed. The algorithms we propose have been implemented and tested on a variety of parallel systems that include the SGI Power Challenge, the IBM SP2 and the Cray T3D. Our experimental results support our claims of efficiency, portability and reusability of the presented algorithms.
KW - Architecture independent parallel algorithms
KW - Dense matrix algorithms
KW - Experimental parallel algorithmics
KW - Parallel algorithm design
KW - Parallel performance prediction
UR - http://www.scopus.com/inward/record.url?scp=0036532526&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036532526&partnerID=8YFLogxK
U2 - 10.1016/S0167-739X(01)00068-1
DO - 10.1016/S0167-739X(01)00068-1
M3 - Article
AN - SCOPUS:0036532526
SN - 0167-739X
VL - 18
SP - 573
EP - 593
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
IS - 5
ER -