Abstract
This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics.In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function.This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF & GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning.However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN).This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems.We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI.The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization.Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN & GPI, aligning with our theoretical findings.
Original language | English (US) |
---|---|
Pages (from-to) | 58897-58934 |
Number of pages | 38 |
Journal | Proceedings of Machine Learning Research |
Volume | 235 |
State | Published - 2024 |
Externally published | Yes |
Event | 41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria Duration: Jul 21 2024 → Jul 27 2024 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability