A New Regret-analysis Framework for Budgeted Multi-Armed Bandits

Evan Yifan Xu, Pan Xu

Research output: Contribution to journalArticlepeer-review

Abstract

We consider two versions of the (stochastic) budgeted Multi-Armed Bandit problem. The first one was introduced by Tran-Thanh et al. (AAAI, 2012): Pulling each arm incurs a fixed deterministic cost and yields a random reward i.i.d. sampled from an unknown distribution (prior free). We have a global budget B and aim to devise a strategy to maximize the expected total reward. The second one was introduced by Ding et al. (AAAI, 2013): It has the same setting as before except costs of each arm are i.i.d. samples from an unknown distribution (and independent from its rewards). We propose a new budget-based regret-analysis framework and design two simple algorithms to illustrate the power of our framework. Our regret bounds for both problems not only match the optimal bound of O(ln B) but also significantly reduce the dependence on other input parameters (assumed constants), compared with the two studies of Tran-Thanh et al. (AAAI, 2012) and Ding et al. (AAAI, 2013) where both utilized a time-based framework. Extensive experimental results show the effectiveness and computation efficiency of our proposed algorithms and confirm our theoretical predictions.

Original languageEnglish (US)
Pages (from-to)1943-1959
Number of pages17
JournalJournal of Artificial Intelligence Research
Volume82
DOIs
StatePublished - 2025

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A New Regret-analysis Framework for Budgeted Multi-Armed Bandits'. Together they form a unique fingerprint.

Cite this