TY - GEN
T1 - On-chip vector coprocessor sharing for multicores
AU - Beldianu, Spiridon F.
AU - Ziavras, Sotirios G.
PY - 2011
Y1 - 2011
N2 - For most of the applications that make use of a vector coprocessor, the resources are not highly utilized due to the lack of sustained data parallelism, which sometimes occurs due to vector-length changes in dynamic environments. The motivation of our work stems from (a) the mandate for multicore designs to make efficient use of the on-chip resources, (b) the frequent presence of vector operations in high-performance scientific and embedded applications, (c) the increased probability that different cores may deal with different vector lengths at various times, and (d) different vector kernels in the same or different application suites may have diverse computation needs. Our objective is to provide a versatile design framework that can facilitate vector coprocessor sharing among multiple cores in a manner that maximizes resource utilization while also yielding very high performance at reduced cost. We propose three basic shared vector coprocessor architectures for multicores based on coarse-grain, fine-grain and vector lane sharing. We benchmark these distinct vector architectures for a dual-core system using the floating-point performance and resource utilization metrics. Our analysis shows that vector lane sharing, where the number of vector lanes assigned to a core can be controlled dynamically, provides the greatest flexibility and generally yields very good results. Since, however, each of the three design choices has its own performance advantages under certain vector-load conditions, we ultimately suggest a hybrid vector coprocessor design that can support all three architectural choices as per the core and application collective needs.
AB - For most of the applications that make use of a vector coprocessor, the resources are not highly utilized due to the lack of sustained data parallelism, which sometimes occurs due to vector-length changes in dynamic environments. The motivation of our work stems from (a) the mandate for multicore designs to make efficient use of the on-chip resources, (b) the frequent presence of vector operations in high-performance scientific and embedded applications, (c) the increased probability that different cores may deal with different vector lengths at various times, and (d) different vector kernels in the same or different application suites may have diverse computation needs. Our objective is to provide a versatile design framework that can facilitate vector coprocessor sharing among multiple cores in a manner that maximizes resource utilization while also yielding very high performance at reduced cost. We propose three basic shared vector coprocessor architectures for multicores based on coarse-grain, fine-grain and vector lane sharing. We benchmark these distinct vector architectures for a dual-core system using the floating-point performance and resource utilization metrics. Our analysis shows that vector lane sharing, where the number of vector lanes assigned to a core can be controlled dynamically, provides the greatest flexibility and generally yields very good results. Since, however, each of the three design choices has its own performance advantages under certain vector-load conditions, we ultimately suggest a hybrid vector coprocessor design that can support all three architectural choices as per the core and application collective needs.
KW - FPGA prototyping
KW - MicroBlaze
KW - Vector coprocessor
KW - coprocessor sharing
KW - multicore
UR - http://www.scopus.com/inward/record.url?scp=79955016373&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79955016373&partnerID=8YFLogxK
U2 - 10.1109/PDP.2011.64
DO - 10.1109/PDP.2011.64
M3 - Conference contribution
AN - SCOPUS:79955016373
SN - 9780769543284
T3 - Proceedings - 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing, PDP 2011
SP - 431
EP - 438
BT - Proceedings - 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing, PDP 2011
T2 - 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing, PDP 2011
Y2 - 9 February 2011 through 11 February 2011
ER -