Abstract
In this article, we study the completion time minimization in an unmanned aerial vehicle (UAV)-enabled Internet of Things (IoT) network, where the UAV tries to collect all the data generated by the ground IoT devices for further processing. To simplify the analysis, the continuous time horizon is discretized into several time slots. The duration of each time slot is set to be less than the pre-defined threshold such that the UAV's location can be considered as unchanged during each time slot. In our work, we aim to minimize the completion time of the UAV by optimizing the association scheme of the IoT devices, the location (i.e., the trajectory) and velocity of the UAV at each time slot. However, the formulated problem is challenging to solve by traditional optimization methods considering the unknown number of time slots (which leads to the unknown number of decision variables) and non-convex functions. We thus reformulate it as a Markov decision process (MDP) and propose a deep deterministic policy gradient (DDPG)-based method to efficiently solve it. The DDPG-based algorithm uses deep function approximators instead of finding the action that maximizes the state-action value, and is therefore well suited to solve high-dimensional, continuous control problems. Extensive numerical results are presented to validate the effectiveness of our proposed algorithm.
Original language | English (US) |
---|---|
Pages (from-to) | 14734-14742 |
Number of pages | 9 |
Journal | IEEE Transactions on Vehicular Technology |
Volume | 72 |
Issue number | 11 |
DOIs | |
State | Published - Nov 1 2023 |
All Science Journal Classification (ASJC) codes
- Aerospace Engineering
- Electrical and Electronic Engineering
- Computer Networks and Communications
- Automotive Engineering
Keywords
- Unmanned aerial vehicle (UAV)
- deep reinforcement learning (DRL)
- trajectory optimization