Abstract
Users on edge generate deep inference requests continuously over time. Mobile/edge devices located near users can undertake the computation of inference locally for users, e.g., the embedded edge device on an autonomous vehicle. Due to limited computing resources on one mobile/edge device, it may be challenging to process the inference requests from users with high throughput. An attractive solution is to (partially) offload the computation to a remote device in the network. In this paper, we examine the existing inference execution solutions across local and remote devices and propose an adaptive scheduler, a BPS scheduler, for continuous deep inference on collaborative edge intelligence. By leveraging data parallel, neurosurgeon, reinforcement learning techniques, BPS can boost the overall inference performance by up to 8.2× over the baseline schedulers. A lightweight compressor, FF, specialized in compressing intermediate output data for neurosurgeon, is proposed and integrated into the BPS scheduler. FF exploits the operating character of convolutional layers and utilizes efficient approximation algorithms. Compared to existing compression methods, FF achieves up to 86.9% lower accuracy loss and up to 83.6% lower latency overhead.
Original language | English (US) |
---|---|
Pages (from-to) | 830-843 |
Number of pages | 14 |
Journal | IEEE Transactions on Cloud Computing |
Volume | 12 |
Issue number | 3 |
DOIs | |
State | Published - 2024 |
All Science Journal Classification (ASJC) codes
- Software
- Information Systems
- Hardware and Architecture
- Computer Science Applications
- Computer Networks and Communications
Keywords
- convolutional neural networks
- Edge computing
- efficient AI
- reinforcement learning