TY - GEN
T1 - DIMA
T2 - 37th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2018
AU - Angizi, Shaahin
AU - He, Zhezhi
AU - Fan, Deliang
N1 - Publisher Copyright:
© 2018 ACM.
PY - 2018/11/5
Y1 - 2018/11/5
N2 - In this work, we first propose a deep depthwise Convolutional Neural Network (CNN) structure, called Add-Net, which uses binarized depthwise separable convolution to replace conventional spatial-convolution. In Add-Net, the computationally expensive convolution operations (i.e. Multiplication and Accumulation) are converted into hardware-friendly Addition operations. We meticulously investigate and analyze the Add-Net's performance (i.e. accuracy, parameter size and computational cost) in object recognition application compared to traditional baseline CNN using the most popular large scale ImageNet dataset. Accordingly, we propose a Depthwise CNN In-Memory Accelerator (DIMA) based on SOT-MRAM computational sub-arrays to efficiently accelerate Add-Net within non-volatile MRAM. Our device-to-architecture co-simulation results show that, with almost the same inference accuracy to the baseline CNN on different data-sets, DIMA can obtain ∼1.4× better energy-efficiency and 15.7× speedup compared to ASICs, and, ∼1.6× better energy-efficiency and 5.6× speedup over the best processing-in-DRAM accelerators.
AB - In this work, we first propose a deep depthwise Convolutional Neural Network (CNN) structure, called Add-Net, which uses binarized depthwise separable convolution to replace conventional spatial-convolution. In Add-Net, the computationally expensive convolution operations (i.e. Multiplication and Accumulation) are converted into hardware-friendly Addition operations. We meticulously investigate and analyze the Add-Net's performance (i.e. accuracy, parameter size and computational cost) in object recognition application compared to traditional baseline CNN using the most popular large scale ImageNet dataset. Accordingly, we propose a Depthwise CNN In-Memory Accelerator (DIMA) based on SOT-MRAM computational sub-arrays to efficiently accelerate Add-Net within non-volatile MRAM. Our device-to-architecture co-simulation results show that, with almost the same inference accuracy to the baseline CNN on different data-sets, DIMA can obtain ∼1.4× better energy-efficiency and 15.7× speedup compared to ASICs, and, ∼1.6× better energy-efficiency and 5.6× speedup over the best processing-in-DRAM accelerators.
UR - http://www.scopus.com/inward/record.url?scp=85058159102&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85058159102&partnerID=8YFLogxK
U2 - 10.1145/3240765.3240799
DO - 10.1145/3240765.3240799
M3 - Conference contribution
AN - SCOPUS:85058159102
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
BT - 2018 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2018 - Digest of Technical Papers
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 November 2018 through 8 November 2018
ER -