Reinforcement learning (RL) algorithms are purely data-driven and do not leverage any domain knowledge about the nature of the available actions, the system's state transition dynamics, and its cost/reward function. This severely penalizes their ability to meet critical requirements of emerging wireless applications, due to the inefficiency with which these algorithms learn from their interactions with the environment. In this article, we describe how data-driven RL algorithms can be improved by systematically integrating basic system models into the learning process. Our proposed approach uses real-time data in conjunction with knowledge about the underlying communication system to achieve orders of magnitude improvement in key performance metrics, such as convergence speed and compute/memory complexity, relative to well-established RL benchmarks.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Computer Networks and Communications
- Electrical and Electronic Engineering