ACHIEVING SUB-LINEAR REGRET IN INFINITE HORIZON AVERAGE REWARD CONSTRAINED MDP WITH LINEAR FUNCTION APPROXIMATION

Arnob Ghosh, Xingyu Zhou, Ness Shroff

Research output: Contribution to conferencePaperpeer-review

Abstract

We study the infinite horizon average reward constrained Markov Decision Process (CMDP). In contrast to existing works on model-based, finite state space, we consider the model-free linear CMDP setup. We first propose a computationally inefficient algorithm and show that Õ(d3T) regret and constraint violation can be achieved, in which T is the number of interactions, and d is the dimension of the feature mapping. We also propose an efficient variant based on the primal-dual adaptation of the LSVI-UCB algorithm and show that Õ((dT)3/4) regret and constraint violation can be achieved. This improves the known regret bound of Õ(T5/6) for the finite state-space model-free constrained RL which was obtained under a stronger assumption compared to ours. We also develop an efficient policy-based algorithm via novel adaptation of the MDP-EXP2 algorithm to the primal-dual set up with Õ(T) regret and even zero constraint violation bound under a stronger set of assumptions.

Original languageEnglish (US)
StatePublished - 2023
Externally publishedYes
Event11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda
Duration: May 1 2023May 5 2023

Conference

Conference11th International Conference on Learning Representations, ICLR 2023
Country/TerritoryRwanda
CityKigali
Period5/1/235/5/23

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'ACHIEVING SUB-LINEAR REGRET IN INFINITE HORIZON AVERAGE REWARD CONSTRAINED MDP WITH LINEAR FUNCTION APPROXIMATION'. Together they form a unique fingerprint.

Cite this