Abstract
We extend the stochastic multi-armed bandit to the case where the number of arms to play evolves as a stationary process. Our work is motivated by demand response in power systems, in which the number of arms to play, or loads to dispatch, depends on a random power imbalance. We give an upper confidence bound-based algorithm that achieves sublinear pseudo-regret. We apply our results in several examples from demand response.
Original language | English (US) |
---|---|
Pages (from-to) | 2280-2286 |
Number of pages | 7 |
Journal | IEEE Transactions on Automatic Control |
Volume | 63 |
Issue number | 7 |
DOIs | |
State | Published - Jul 2018 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Computer Science Applications
- Electrical and Electronic Engineering
Keywords
- Demand response
- multi-armed bandit (MAB)
- online learning
- stochastic bandit