TY - GEN
T1 - Accelerating Low Bit-width Neural Networks at the Edge, PIM or FPGA
T2 - 33rd Great Lakes Symposium on VLSI, GLSVLSI 2023
AU - Kochar, Nakul
AU - Ekiert, Lucas
AU - Najafi, Deniz
AU - Fan, Deliang
AU - Angizi, Shaahin
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/6/5
Y1 - 2023/6/5
N2 - Deep Neural Network (DNN) acceleration with digital Processing-in-Memory (PIM) platforms at the edge is an actively-explored domain with great potential to not only address memory-wall bottlenecks but to offer orders of performance improvement in comparison to the von-Neumann architecture. On the other side, FPGA-based edge computing has been followed as a potential solution to accelerate compute-intensive workloads. In this work, adopting low-bit-width neural networks, we perform a solid and comparative inference performance analysis of a recent processing-in-SRAM tape-out with a low-resource FPGA board and a high-performance GPU to provide a guideline for the research community. We explore and highlight the key architectural constraints of these edge candidates that impact their overall performance. Our experimental data demonstrate that the processing-in-SRAM can obtain up to ∼160x speed-up and up to 228x higher efficiency (img/s/W) compared to the under-test FPGA on the CIFAR-10 dataset.
AB - Deep Neural Network (DNN) acceleration with digital Processing-in-Memory (PIM) platforms at the edge is an actively-explored domain with great potential to not only address memory-wall bottlenecks but to offer orders of performance improvement in comparison to the von-Neumann architecture. On the other side, FPGA-based edge computing has been followed as a potential solution to accelerate compute-intensive workloads. In this work, adopting low-bit-width neural networks, we perform a solid and comparative inference performance analysis of a recent processing-in-SRAM tape-out with a low-resource FPGA board and a high-performance GPU to provide a guideline for the research community. We explore and highlight the key architectural constraints of these edge candidates that impact their overall performance. Our experimental data demonstrate that the processing-in-SRAM can obtain up to ∼160x speed-up and up to 228x higher efficiency (img/s/W) compared to the under-test FPGA on the CIFAR-10 dataset.
KW - deep neural networks
KW - fpga
KW - processing-in-memory
KW - sram
UR - http://www.scopus.com/inward/record.url?scp=85163164087&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163164087&partnerID=8YFLogxK
U2 - 10.1145/3583781.3590213
DO - 10.1145/3583781.3590213
M3 - Conference contribution
AN - SCOPUS:85163164087
T3 - Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI
SP - 625
EP - 630
BT - GLSVLSI 2023 - Proceedings of the Great Lakes Symposium on VLSI 2023
PB - Association for Computing Machinery
Y2 - 5 June 2023 through 7 June 2023
ER -