TY - GEN
T1 - Performance optimization of an FPGA-based configurable multiprocessor for matrix operations
AU - Wang, Xiaofang
AU - Ziavras, Sotirios G.
N1 - Copyright:
Copyright 2015 Elsevier B.V., All rights reserved.
PY - 2003
Y1 - 2003
N2 - Several driving forces have recently brought about significant advances in the field of configurable computing. They have also enabled parallel processing within a single field-programmable gate array (FPGA) chip. The ever-increasing complexity of application algorithms and the supercomputing crisis have made this new parallel-processing approach more important and pertinent. Its cost-effectiveness provides system designers with the greatest flexibility while imposing many challenges to current hardware and software codesign methodologies. This paper explores practical hardware and software design and implementation issues for FPGA-based configurable multiprocessors, based on the authors' first-hand experience with a shared-memory implementation of parallel LU factorization for sparse block-diagonal-bordered (BDB) matrices. We also propose a new dynamic load balancing strategy for parallel LU factorization on our system. Performance results are included to prove the viability of this new multiprocessor design approach.
AB - Several driving forces have recently brought about significant advances in the field of configurable computing. They have also enabled parallel processing within a single field-programmable gate array (FPGA) chip. The ever-increasing complexity of application algorithms and the supercomputing crisis have made this new parallel-processing approach more important and pertinent. Its cost-effectiveness provides system designers with the greatest flexibility while imposing many challenges to current hardware and software codesign methodologies. This paper explores practical hardware and software design and implementation issues for FPGA-based configurable multiprocessors, based on the authors' first-hand experience with a shared-memory implementation of parallel LU factorization for sparse block-diagonal-bordered (BDB) matrices. We also propose a new dynamic load balancing strategy for parallel LU factorization on our system. Performance results are included to prove the viability of this new multiprocessor design approach.
UR - http://www.scopus.com/inward/record.url?scp=84946069788&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84946069788&partnerID=8YFLogxK
U2 - 10.1109/FPT.2003.1275763
DO - 10.1109/FPT.2003.1275763
M3 - Conference contribution
AN - SCOPUS:84946069788
T3 - Proceedings - 2003 IEEE International Conference on Field-Programmable Technology, FPT 2003
SP - 303
EP - 306
BT - Proceedings - 2003 IEEE International Conference on Field-Programmable Technology, FPT 2003
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd International Conference on Field Programmable Technology, FPT 2003
Y2 - 15 December 2003 through 17 December 2003
ER -