Policy-Regret Minimization in Markov Games with Function Approximation

Research output: Contribution to journalConference articlepeer-review

Abstract

We study policy-regret minimization problem in dynamically evolving environments, modeled as Markov games between a learner and a strategic, adaptive opponent. We propose a general algorithmic framework that achieves the optimal O(T ) policy regret for a wide class of large-scale problems characterized by an Eluder-type condition– extending beyond the tabular settings of previous work. Importantly, our framework uncovers a simpler yet powerful algorithmic approach for handling reactive adversaries, demonstrating that leveraging opponent learning in such settings is key to attaining the optimal O(T ) policy regret.

Original languageEnglish (US)
Pages (from-to)46242-46264
Number of pages23
JournalProceedings of Machine Learning Research
Volume267
StatePublished - 2025
Externally publishedYes
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: Jul 13 2025Jul 19 2025

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Policy-Regret Minimization in Markov Games with Function Approximation'. Together they form a unique fingerprint.

Cite this