TY - GEN
T1 - Calibration-Aided Edge Inference Offloading via Adaptive Model Partitioning of Deep Neural Networks
AU - Pacheco, Roberto G.
AU - Couto, Rodrigo S.
AU - Simeone, Osvaldo
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/6
Y1 - 2021/6
N2 - Mobile devices can offload deep neural network (DNN)-based inference to the cloud, overcoming local hardware and energy limitations. However, offloading adds communication delay, thus increasing the overall inference time, and hence it should be used only when needed. An approach to address this problem consists of the use of adaptive model partitioning based on early-exit DNNs. Accordingly, the inference starts at the mobile device, and an intermediate layer estimates the accuracy: If the estimated accuracy is sufficient, the device takes the inference decision; Otherwise, the remaining layers of the DNN run at the cloud. Thus, the device offloads the inference to the cloud only if it cannot classify a sample with high confidence. This offloading requires a correct accuracy prediction at the device. Nevertheless, DNNs are typically miscalibrated, providing overconfident decisions. This work shows that the employment of a miscalibrated early-exit DNN for offloading via model partitioning can significantly decrease inference accuracy. In contrast, we argue that implementing a calibration algorithm prior to deployment can solve this problem, allowing for more reliable offloading decisions.
AB - Mobile devices can offload deep neural network (DNN)-based inference to the cloud, overcoming local hardware and energy limitations. However, offloading adds communication delay, thus increasing the overall inference time, and hence it should be used only when needed. An approach to address this problem consists of the use of adaptive model partitioning based on early-exit DNNs. Accordingly, the inference starts at the mobile device, and an intermediate layer estimates the accuracy: If the estimated accuracy is sufficient, the device takes the inference decision; Otherwise, the remaining layers of the DNN run at the cloud. Thus, the device offloads the inference to the cloud only if it cannot classify a sample with high confidence. This offloading requires a correct accuracy prediction at the device. Nevertheless, DNNs are typically miscalibrated, providing overconfident decisions. This work shows that the employment of a miscalibrated early-exit DNN for offloading via model partitioning can significantly decrease inference accuracy. In contrast, we argue that implementing a calibration algorithm prior to deployment can solve this problem, allowing for more reliable offloading decisions.
UR - http://www.scopus.com/inward/record.url?scp=85114267495&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85114267495&partnerID=8YFLogxK
U2 - 10.1109/ICC42927.2021.9500760
DO - 10.1109/ICC42927.2021.9500760
M3 - Conference contribution
AN - SCOPUS:85114267495
T3 - IEEE International Conference on Communications
BT - ICC 2021 - IEEE International Conference on Communications, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Communications, ICC 2021
Y2 - 14 June 2021 through 23 June 2021
ER -