TY - GEN
T1 - RNSiM
T2 - 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021
AU - Roohi, Arman
AU - Taheri, Mohammad Reza
AU - Angizi, Shaahin
AU - Fan, Deliang
N1 - Publisher Copyright:
©2021 IEEE
PY - 2021
Y1 - 2021
N2 - In this paper, we propose an efficient convolutional neural network (CNN) accelerator design, entitled RNSiM, based on the Residue Number System (RNS) as an alternative for the conventional binary number representation. Instead of traditional arithmetic implementation that suffers from the inevitable lengthy carry propagation chain, the novelty of RNSiM lies in that all the data, including stored weights and communication/computation, are performed in the RNS domain. Due to the inherent parallelism of the RNS arithmetic, power and latency are significantly reduced. Moreover, an enhanced integrated intermodulo operation core is developed to decrease the overhead imposed by non-modular operations. Further improvement in systems’ performance efficiency is achieved by developing efficient Processing-in-Memory (PIM) designs using various volatile CMOS and non-volatile Post-CMOS technologies to accelerate RNS-based multiplication-and-accumulations (MACs). The RNSiM accelerator’s performance on different datasets, including MNIST, SVHN, and CIFAR-10, is evaluated. With almost the same accuracy to the baseline CNN, the RNSiM accelerator can significantly increase both energy-efficiency and speedup compared with the state-of-the-art FPGA, GPU, and PIM designs. RNSiM and other RNS-PIMs, based on our method, reduce the energy consumption by orders of 28-77× and 331-897× compared with the FPGA and the GPU platforms, respectively.
AB - In this paper, we propose an efficient convolutional neural network (CNN) accelerator design, entitled RNSiM, based on the Residue Number System (RNS) as an alternative for the conventional binary number representation. Instead of traditional arithmetic implementation that suffers from the inevitable lengthy carry propagation chain, the novelty of RNSiM lies in that all the data, including stored weights and communication/computation, are performed in the RNS domain. Due to the inherent parallelism of the RNS arithmetic, power and latency are significantly reduced. Moreover, an enhanced integrated intermodulo operation core is developed to decrease the overhead imposed by non-modular operations. Further improvement in systems’ performance efficiency is achieved by developing efficient Processing-in-Memory (PIM) designs using various volatile CMOS and non-volatile Post-CMOS technologies to accelerate RNS-based multiplication-and-accumulations (MACs). The RNSiM accelerator’s performance on different datasets, including MNIST, SVHN, and CIFAR-10, is evaluated. With almost the same accuracy to the baseline CNN, the RNSiM accelerator can significantly increase both energy-efficiency and speedup compared with the state-of-the-art FPGA, GPU, and PIM designs. RNSiM and other RNS-PIMs, based on our method, reduce the energy consumption by orders of 28-77× and 331-897× compared with the FPGA and the GPU platforms, respectively.
KW - Accelerator
KW - Convolutional neural network
KW - Processing-in-Memory
KW - residue number system
UR - http://www.scopus.com/inward/record.url?scp=85123700391&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123700391&partnerID=8YFLogxK
U2 - 10.1109/ICCAD51958.2021.9643531
DO - 10.1109/ICCAD51958.2021.9643531
M3 - Conference contribution
AN - SCOPUS:85123700391
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
BT - 2021 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 November 2021 through 4 November 2021
ER -