The multi-armed bandit with stochastic plays

Antoine Lesage-Landry, Joshua A. Taylor

Research output: Contribution to journalArticlepeer-review

15 Scopus citations


We extend the stochastic multi-armed bandit to the case where the number of arms to play evolves as a stationary process. Our work is motivated by demand response in power systems, in which the number of arms to play, or loads to dispatch, depends on a random power imbalance. We give an upper confidence bound-based algorithm that achieves sublinear pseudo-regret. We apply our results in several examples from demand response.

Original languageEnglish (US)
Pages (from-to)2280-2286
Number of pages7
JournalIEEE Transactions on Automatic Control
Issue number7
StatePublished - Jul 2018
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Computer Science Applications
  • Electrical and Electronic Engineering


  • Demand response
  • multi-armed bandit (MAB)
  • online learning
  • stochastic bandit

Cite this