Abstract
We extend the stochastic multi-armed bandit to the case where the number of arms to play evolves as a stationary process. Our work is motivated by demand response in power systems, in which the number of arms to play, or loads to dispatch, depends on a random power imbalance. We give an upper confidence bound-based algorithm that achieves sublinear pseudo-regret. We apply our results in several examples from demand response.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 2280-2286 |
| Number of pages | 7 |
| Journal | IEEE Transactions on Automatic Control |
| Volume | 63 |
| Issue number | 7 |
| DOIs | |
| State | Published - Jul 2018 |
| Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Computer Science Applications
- Electrical and Electronic Engineering
Keywords
- Demand response
- multi-armed bandit (MAB)
- online learning
- stochastic bandit