Reinforcement Learning Based Online Request Scheduling Framework for Workload-Adaptive Edge Deep Learning Inference

Xinrui Tan, Hongjia Li, Xiaofei Xie, Lu Guo, Nirwan Ansari, Xueqing Huang, Liming Wang, Zhen Xu, Yang Liu

Research output: Contribution to journalArticlepeer-review

Abstract

The recent advances of deep learning in various mobile and Internet-of-Things applications, coupled with the emergence of edge computing, have led to a strong trend of performing deep learning inference on the edge servers located physically close to the end devices. This trend presents the challenge of how to meet the quality-of-service requirements of inference tasks at the resource-constrained network edge, especially under variable or even bursty inference workloads. Solutions to this challenge have not yet been reported in the related literature. In the present paper, we tackle this challenge by means of workload-adaptive inference request scheduling: in different workload states, via adaptive inference request scheduling policies, different models with diverse model sizes can play different roles to maintain high-quality inference services. To implement this idea, we propose a request scheduling framework for general-purpose edge inference serving systems. Theoretically, we prove that, in our framework, the problem of optimizing the inference request scheduling policies can be formulated as a Markov decision process (MDP). To tackle such an MDP, we use reinforcement learning and propose a policy optimization approach. Through extensive experiments, we empirically demonstrate the effectiveness of our framework in the challenging practical case where the MDP is partially observable.

Original languageEnglish (US)
Pages (from-to)1-18
Number of pages18
JournalIEEE Transactions on Mobile Computing
DOIs
StateAccepted/In press - 2024

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Keywords

  • Adaptation models
  • Computational modeling
  • Deep learning
  • deep learning inference serving systems
  • Edge computing
  • efficient deep learning inference
  • Processor scheduling
  • Reinforcement learning
  • reinforcement learning
  • Schedules
  • Task analysis

Fingerprint

Dive into the research topics of 'Reinforcement Learning Based Online Request Scheduling Framework for Workload-Adaptive Edge Deep Learning Inference'. Together they form a unique fingerprint.

Cite this