TY - JOUR
T1 - FPGA implementation of a cholesky algorithm for a shared-memory multiprocessor architecture
AU - Haridas, Satchidanand G.
AU - Ziavras, Sotirios G.
N1 - Funding Information:
*This research was supported in part by the US Department of Energy under grants ER63384 and DE-FG02-03CH11171. †Corresponding author. E-mail: [email protected]
PY - 2004/12
Y1 - 2004/12
N2 - Solving a system of linear equations is a key problem in engineering and science. Matrix factorization is a key component of many methods used to solve such equations. However, the factorization process is very time consuming, so these problems have often been targeted for parallel machines rather than sequential ones. Nevertheless, commercially available supercomputers are expensive and only large institutions have the resources to purchase them. Hence, efforts are on to develop more affordable alternatives. In this paper, we propose such an approach. We present an implementation of a parallel version of the Cholesky matrix factorization algorithm on a single-chip multiprocessor built inside an APEX20K series Field-Programmable Gate Array (FPGA) developed by Altera. Our multiprocessor system uses an asymmetric, shared-memory MIMD architecture and was built using the configurable Nios× processor core which was also developed by Altera. Our system was developed using Altera's System-On-a-Programmable-Chip (SOPC) Quartus II development environment. Our Cholesky implementation is based on an algorithm described by George et al. [6]. This algorithm is scalable and uses a "queue of tasks" approach to ensure dynamic load-balancing among the processing elements. Our implementation assumes dense matrices in the input. We present performance results for uniprocessor and multiprocessor implementations. Our results show that the implementation of multiprocessors inside FPGAs can benefit matrix operations, such as matrix factorization. Further benefits result from good dynamic load-balancing techniques.
AB - Solving a system of linear equations is a key problem in engineering and science. Matrix factorization is a key component of many methods used to solve such equations. However, the factorization process is very time consuming, so these problems have often been targeted for parallel machines rather than sequential ones. Nevertheless, commercially available supercomputers are expensive and only large institutions have the resources to purchase them. Hence, efforts are on to develop more affordable alternatives. In this paper, we propose such an approach. We present an implementation of a parallel version of the Cholesky matrix factorization algorithm on a single-chip multiprocessor built inside an APEX20K series Field-Programmable Gate Array (FPGA) developed by Altera. Our multiprocessor system uses an asymmetric, shared-memory MIMD architecture and was built using the configurable Nios× processor core which was also developed by Altera. Our system was developed using Altera's System-On-a-Programmable-Chip (SOPC) Quartus II development environment. Our Cholesky implementation is based on an algorithm described by George et al. [6]. This algorithm is scalable and uses a "queue of tasks" approach to ensure dynamic load-balancing among the processing elements. Our implementation assumes dense matrices in the input. We present performance results for uniprocessor and multiprocessor implementations. Our results show that the implementation of multiprocessors inside FPGAs can benefit matrix operations, such as matrix factorization. Further benefits result from good dynamic load-balancing techniques.
KW - FPGA
KW - Matrix inversion
KW - Parallel Cholesky factorization
KW - Performance evaluation
UR - http://www.scopus.com/inward/record.url?scp=6344242990&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=6344242990&partnerID=8YFLogxK
U2 - 10.1080/10637190412331279957
DO - 10.1080/10637190412331279957
M3 - Article
AN - SCOPUS:6344242990
SN - 1063-7192
VL - 19
SP - 211
EP - 226
JO - Parallel Algorithms and Applications
JF - Parallel Algorithms and Applications
IS - 4
ER -