Provably Efficient RL for Linear MDPs under Instantaneous Safety Constraints in Non-Convex Feature Spaces

  • Amirhossein Roknilamouki
  • , Arnob Ghosh
  • , Ming Shi
  • , Fatemeh Nourzad
  • , Eylem Ekici
  • , Ness B. Shroff

Research output: Contribution to journalConference articlepeer-review

Abstract

In Reinforcement Learning (RL), tasks with instantaneous hard constraints present significant challenges, particularly when the decision space is non-convex or non-star-convex. This issue is especially relevant in domains like autonomous vehicles and robotics, where constraints such as collision avoidance often take a non-convex form, and the state-space may be large. In this paper, we establish a regret bound ofÕ((1 + 1 τ )log( 1 τ ) d3 H4 K), applicable to both star-convex and non-star-convex cases, where d is the feature dimension, H the episode length, K the number of episodes, and τ the safety threshold for a linear MDP setting. Moreover, the violation of safety constraints is zero with a high probability throughout the learning process. A key technical challenge in these settings is bounding the covering number of the value-function class, which is essential for achieving value-aware uniform concentration in model-free function approximation. For the star-convex setting, we develop a novel technique called Objective–Constraint Decomposition (OCD) to properly bound the covering number, and resolves an error in a previous work on the constrained RL.

Original languageEnglish (US)
Pages (from-to)51957-51995
Number of pages39
JournalProceedings of Machine Learning Research
Volume267
StatePublished - 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: Jul 13 2025Jul 19 2025

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Provably Efficient RL for Linear MDPs under Instantaneous Safety Constraints in Non-Convex Feature Spaces'. Together they form a unique fingerprint.

Cite this