TY - GEN
T1 - IMC
T2 - 2017 Neuromorphic Computing Symposium, NCS 2017
AU - Angizi, Shaahin
AU - Fan, Deliang
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/7/17
Y1 - 2017/7/17
N2 - Deep Convolutional Neural Networks (CNNs) are widely employed in modern AI systems due to their unprecedented accuracy in object recognition and detection. However, it has been proven that the main bottleneck to improve large scale deep CNN based hardware implementation performance is massive data communication between processing units and o-chip memory. In this paper, we pave a way towards novel concept of in-memory convolver (IMC) that could implement the dominant convolution computation within main memory based on our proposed Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) array architecture to greatly reduce data communication and thus accelerate Binary CNN (BCNN). The proposed architecture could simultaneously work as non-volatile memory and a recongurable in-memory logic (AND, OR) without add-on logic circuits to memory chip as in conventional logic-in-memory designs. The computed logic output could be also simply read out like a normal MRAM bit-cell using the shared memory peripheral circuits. We employ such intrinsic in-memory processing architecture to eciently process data within memory to greatly reduce power-hungry and long distance data communication concerning state-of-the-art BCNN hardware. The hardware mapping results show that IMC can process the Binarized AlexNet on ImageNet data-set favorably with 134.27 J/img where ∼ 16× and 9× lower energy and area are achieved, respectively, compared to RRAM-based BCNN. Furthermore, 21.5% reduction in data movement in term of main memory accesses is observed compared to CPU/DRAM baseline.
AB - Deep Convolutional Neural Networks (CNNs) are widely employed in modern AI systems due to their unprecedented accuracy in object recognition and detection. However, it has been proven that the main bottleneck to improve large scale deep CNN based hardware implementation performance is massive data communication between processing units and o-chip memory. In this paper, we pave a way towards novel concept of in-memory convolver (IMC) that could implement the dominant convolution computation within main memory based on our proposed Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) array architecture to greatly reduce data communication and thus accelerate Binary CNN (BCNN). The proposed architecture could simultaneously work as non-volatile memory and a recongurable in-memory logic (AND, OR) without add-on logic circuits to memory chip as in conventional logic-in-memory designs. The computed logic output could be also simply read out like a normal MRAM bit-cell using the shared memory peripheral circuits. We employ such intrinsic in-memory processing architecture to eciently process data within memory to greatly reduce power-hungry and long distance data communication concerning state-of-the-art BCNN hardware. The hardware mapping results show that IMC can process the Binarized AlexNet on ImageNet data-set favorably with 134.27 J/img where ∼ 16× and 9× lower energy and area are achieved, respectively, compared to RRAM-based BCNN. Furthermore, 21.5% reduction in data movement in term of main memory accesses is observed compared to CPU/DRAM baseline.
KW - Deep convolutional Neural Network
KW - In-memory computing
KW - Spin Hall eect
UR - http://www.scopus.com/inward/record.url?scp=85047014455&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047014455&partnerID=8YFLogxK
U2 - 10.1145/3183584.3183613
DO - 10.1145/3183584.3183613
M3 - Conference contribution
AN - SCOPUS:85047014455
T3 - ACM International Conference Proceeding Series
BT - Proceedings of Neuromorphic Computing Symposium, NCS 2017
PB - Association for Computing Machinery
Y2 - 17 July 2017 through 19 July 2017
ER -