TY - GEN
T1 - Dystri
T2 - 52nd International Conference on Parallel Processing, ICPP 2023
AU - Hou, Xueyu
AU - Guan, Yongjie
AU - Han, Tao
N1 - Publisher Copyright:
© 2023 Association for Computing Machinery. All rights reserved.
PY - 2023/8/7
Y1 - 2023/8/7
N2 - Deep neural network (DNN) inference poses unique challenges in serving computational requests due to high request intensity, concurrent multi-user scenarios, and diverse heterogeneous service types. Simultaneously, mobile and edge devices provide users with enhanced computational capabilities, enabling them to utilize local resources for deep inference processing. Moreover, dynamic inference techniques allow content-based computational cost selection per request. This paper presents Dystri, an innovative framework devised to facilitate dynamic inference on distributed edge infrastructure, thereby accommodating multiple heterogeneous users. Dystri offers a broad applicability in practical environments, encompassing heterogeneous device types, DNN-based applications, and dynamic inference techniques, surpassing the state-of-the-art (SOTA) approaches. With distributed controllers and a global coordinator, Dystri allows per-request, per-user adjustments of quality-of-service, ensuring instantaneous, flexible, and discrete control. The decoupled workflows in Dystri naturally support user heterogeneity and scalability, addressing crucial aspects overlooked by existing SOTA works. Our evaluation involves three multi-user, heterogeneous DNN inference service platforms deployed on distributed edge infrastructure, encompassing seven DNN applications. Results show Dystri achieves near-zero deadline misses and excels in adapting to varying user numbers and request intensities. Dystri outperforms baselines with accuracy improvement up to 95×.
AB - Deep neural network (DNN) inference poses unique challenges in serving computational requests due to high request intensity, concurrent multi-user scenarios, and diverse heterogeneous service types. Simultaneously, mobile and edge devices provide users with enhanced computational capabilities, enabling them to utilize local resources for deep inference processing. Moreover, dynamic inference techniques allow content-based computational cost selection per request. This paper presents Dystri, an innovative framework devised to facilitate dynamic inference on distributed edge infrastructure, thereby accommodating multiple heterogeneous users. Dystri offers a broad applicability in practical environments, encompassing heterogeneous device types, DNN-based applications, and dynamic inference techniques, surpassing the state-of-the-art (SOTA) approaches. With distributed controllers and a global coordinator, Dystri allows per-request, per-user adjustments of quality-of-service, ensuring instantaneous, flexible, and discrete control. The decoupled workflows in Dystri naturally support user heterogeneity and scalability, addressing crucial aspects overlooked by existing SOTA works. Our evaluation involves three multi-user, heterogeneous DNN inference service platforms deployed on distributed edge infrastructure, encompassing seven DNN applications. Results show Dystri achieves near-zero deadline misses and excels in adapting to varying user numbers and request intensities. Dystri outperforms baselines with accuracy improvement up to 95×.
KW - MLaaS
KW - dynamic inference
KW - edge computing
UR - http://www.scopus.com/inward/record.url?scp=85179889219&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85179889219&partnerID=8YFLogxK
U2 - 10.1145/3605573.3605598
DO - 10.1145/3605573.3605598
M3 - Conference contribution
AN - SCOPUS:85179889219
T3 - ACM International Conference Proceeding Series
SP - 625
EP - 634
BT - 52nd International Conference on Parallel Processing, ICPP 2023 - Main Conference Proceedings
PB - Association for Computing Machinery
Y2 - 7 August 2023 through 10 August 2023
ER -