TY - GEN
T1 - A Flexible Processing-in-Memory Accelerator for Dynamic Channel-Adaptive Deep Neural Networks
AU - Yang, Li
AU - Angizi, Shaahin
AU - Fan, Deliang
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/1
Y1 - 2020/1
N2 - With the success of deep neural networks (DNN), many recent works have been focusing on developing hardware accelerator for power and resource-limited embedded system via model compression techniques, such as quantization, pruning, low-rank approximation, etc. However, almost all existing DNN structure is fixed after deployment, which lacks runtime adaptive DNN structure to adapt to its dynamic hardware resource, power budget, throughput requirement, as well as dynamic workload. Correspondingly, there is no runtime adaptive hardware platform to support dynamic DNN structure. To address this problem, we first propose a dynamic channel-adaptive deep neural network (CA-DNN) which can adjust the involved convolution channel (i.e. model size, computing load) at run-time (i.e. at inference stage without retraining) to dynamically trade off between power, speed, computing load and accuracy. Further, we utilize knowledge distillation method to optimize the model and quantize the model to 8-bits and 16-bits, respectively, for hardware friendly mapping. We test the proposed model on CIFAR-10 and ImageNet dataset by using ResNet. Comparing with the same model size of individual model, our CA-DNN achieves better accuracy. Moreover, as far as we know, we are the first to propose a Processing-in-Memory accelerator for such adaptive neural networks structure based on Spin Orbit Torque Magnetic Random Access Memory(SOT-MRAM) computational adaptive sub-arrays. Then, we comprehensively analyze the trade-off of the model with different channel-width between the accuracy and the hardware parameters, eg., energy, memory, and area overhead.
AB - With the success of deep neural networks (DNN), many recent works have been focusing on developing hardware accelerator for power and resource-limited embedded system via model compression techniques, such as quantization, pruning, low-rank approximation, etc. However, almost all existing DNN structure is fixed after deployment, which lacks runtime adaptive DNN structure to adapt to its dynamic hardware resource, power budget, throughput requirement, as well as dynamic workload. Correspondingly, there is no runtime adaptive hardware platform to support dynamic DNN structure. To address this problem, we first propose a dynamic channel-adaptive deep neural network (CA-DNN) which can adjust the involved convolution channel (i.e. model size, computing load) at run-time (i.e. at inference stage without retraining) to dynamically trade off between power, speed, computing load and accuracy. Further, we utilize knowledge distillation method to optimize the model and quantize the model to 8-bits and 16-bits, respectively, for hardware friendly mapping. We test the proposed model on CIFAR-10 and ImageNet dataset by using ResNet. Comparing with the same model size of individual model, our CA-DNN achieves better accuracy. Moreover, as far as we know, we are the first to propose a Processing-in-Memory accelerator for such adaptive neural networks structure based on Spin Orbit Torque Magnetic Random Access Memory(SOT-MRAM) computational adaptive sub-arrays. Then, we comprehensively analyze the trade-off of the model with different channel-width between the accuracy and the hardware parameters, eg., energy, memory, and area overhead.
UR - http://www.scopus.com/inward/record.url?scp=85083036014&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083036014&partnerID=8YFLogxK
U2 - 10.1109/ASP-DAC47756.2020.9045166
DO - 10.1109/ASP-DAC47756.2020.9045166
M3 - Conference contribution
AN - SCOPUS:85083036014
T3 - Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC
SP - 313
EP - 318
BT - ASP-DAC 2020 - 25th Asia and South Pacific Design Automation Conference, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th Asia and South Pacific Design Automation Conference, ASP-DAC 2020
Y2 - 13 January 2020 through 16 January 2020
ER -