TY - GEN
T1 - NeuLens
T2 - 28th ACM Annual International Conference on Mobile Computing and Networking, MobiCom 2022
AU - Hou, Xueyu
AU - Guan, Yongjie
AU - Han, Tao
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/10/14
Y1 - 2022/10/14
N2 - Convolutional neural networks (CNNs) play an important role in today's mobile and edge computing systems for vision-based tasks like object classification and detection. However, state-of-The-Art methods on CNN acceleration are trapped in either limited practical latency speed-up on general computing platforms or latency speed-up with severe accuracy loss. In this paper, we propose a spatial-based dynamic CNN acceleration framework, NeuLens, for mobile and edge platforms. Specially, we design a novel dynamic inference mechanism, assemble region-Aware convolution (ARAC) supernet, that peels off redundant operations inside CNN models as many as possible based on spatial redundancy and channel slicing. In ARAC supernet, the CNN inference flow is split into multiple independent micro-flows, and the computational cost of each can be autonomously adjusted based on its tiled-input content and application requirements. These micro-flows can be loaded into hardware like GPUs as single models. Consequently, its operation reduction can be well translated into latency speed-up and is compatible with hardware-level accelerations. Moreover, the inference accuracy can be well preserved by identifying critical regions on images and processing them in the original resolution with large micro-flow. Based on our evaluation, NeuLens outperforms baseline methods by up to 58% latency reduction with the same accuracy and by up to 67.9% accuracy improvement under the same latency/memory constraints.
AB - Convolutional neural networks (CNNs) play an important role in today's mobile and edge computing systems for vision-based tasks like object classification and detection. However, state-of-The-Art methods on CNN acceleration are trapped in either limited practical latency speed-up on general computing platforms or latency speed-up with severe accuracy loss. In this paper, we propose a spatial-based dynamic CNN acceleration framework, NeuLens, for mobile and edge platforms. Specially, we design a novel dynamic inference mechanism, assemble region-Aware convolution (ARAC) supernet, that peels off redundant operations inside CNN models as many as possible based on spatial redundancy and channel slicing. In ARAC supernet, the CNN inference flow is split into multiple independent micro-flows, and the computational cost of each can be autonomously adjusted based on its tiled-input content and application requirements. These micro-flows can be loaded into hardware like GPUs as single models. Consequently, its operation reduction can be well translated into latency speed-up and is compatible with hardware-level accelerations. Moreover, the inference accuracy can be well preserved by identifying critical regions on images and processing them in the original resolution with large micro-flow. Based on our evaluation, NeuLens outperforms baseline methods by up to 58% latency reduction with the same accuracy and by up to 67.9% accuracy improvement under the same latency/memory constraints.
KW - convolutional neural networks
KW - dynamic inference
KW - edge computing
UR - http://www.scopus.com/inward/record.url?scp=85140893455&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140893455&partnerID=8YFLogxK
U2 - 10.1145/3495243.3560528
DO - 10.1145/3495243.3560528
M3 - Conference contribution
AN - SCOPUS:85140893455
T3 - Proceedings of the Annual International Conference on Mobile Computing and Networking, MOBICOM
SP - 186
EP - 199
BT - ACM MobiCom 2022 - Proceedings of the 2022 28th Annual International Conference on Mobile Computing and Networking
PB - Association for Computing Machinery
Y2 - 17 October 2202 through 21 October 2202
ER -