TY - GEN
T1 - Accelerating low bit-width deep convolution neural network in MRAM
AU - He, Zhezhi
AU - Angizi, Shaahin
AU - Fan, Deliang
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/8/7
Y1 - 2018/8/7
N2 - Deep Convolution Neural Network (CNN) has achieved outstanding performance in image recognition over large scale dataset. However, pursuit of higher inference accuracy leads to CNN architecture with deeper layers and denser connections, which inevitably makes its hardware implementation demand more and more memory and computational resources. It can be interpreted as 'CNN power and memory wall'. Recent research efforts have significantly reduced both model size and computational complexity by using low bit-width weights, activations and gradients, while keeping reasonably good accuracy. In this work, we present different emerging nonvolatile Magnetic Random Access Memory (MRAM) designs that could be leveraged to implement 'bit-wise in-memory convolution engine', which could simultaneously store network parameters and compute low bit-width convolution. Such new computing model leverages the 'in-memory computing' concept to accelerate CNN inference and reduce convolution energy consumption due to intrinsic logic-in-memory design and reduction of data communication.
AB - Deep Convolution Neural Network (CNN) has achieved outstanding performance in image recognition over large scale dataset. However, pursuit of higher inference accuracy leads to CNN architecture with deeper layers and denser connections, which inevitably makes its hardware implementation demand more and more memory and computational resources. It can be interpreted as 'CNN power and memory wall'. Recent research efforts have significantly reduced both model size and computational complexity by using low bit-width weights, activations and gradients, while keeping reasonably good accuracy. In this work, we present different emerging nonvolatile Magnetic Random Access Memory (MRAM) designs that could be leveraged to implement 'bit-wise in-memory convolution engine', which could simultaneously store network parameters and compute low bit-width convolution. Such new computing model leverages the 'in-memory computing' concept to accelerate CNN inference and reduce convolution energy consumption due to intrinsic logic-in-memory design and reduction of data communication.
KW - In-memory computing
KW - Magnetic Random Access Memory
KW - Neural network acceleration
UR - http://www.scopus.com/inward/record.url?scp=85052125131&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052125131&partnerID=8YFLogxK
U2 - 10.1109/ISVLSI.2018.00103
DO - 10.1109/ISVLSI.2018.00103
M3 - Conference contribution
AN - SCOPUS:85052125131
SN - 9781538670996
T3 - Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI
SP - 533
EP - 538
BT - Proceedings - 2018 IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018
PB - IEEE Computer Society
T2 - 17th IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018
Y2 - 9 July 2018 through 11 July 2018
ER -