Demand response programs incentivize loads to actively moderate their energy consumption to aid the power system. Uncertainty is an intrinsic aspect of demand response because a load's capability is often unknown until the load has been deployed. Algorithms must therefore balance utilizing well-characterized, good loads and learning about poorly characterized but potentially good loads; this is a manifestation of the classical tradeoff between exploration and exploitation. We address this tradeoff in a restless bandit framework, a generalization of the well-known multi-armed bandit problem. The formulation yields index policies in which loads are ranked by a scalar index, and those with the highest are deployed. The policy is particularly appropriate for demand response because the indices have explicit analytical expressions that may be evaluated separately for each load, making them both simple and scalable. This formulation serves as a heuristic basis for when only the aggregate effect of demand response is observed, from which the state of each individual load must be inferred. We formulate a tractable, analytical approximation for individual state inference based on observations of aggregate load curtailments. In numerical examples, the restless bandit policy outperforms the greedy policy by 5%-10% of the total cost. When the states of deployed loads are inferred from aggregate measurements, the resulting performance degradation is on the order of a few percent for the (now heuristic) restless bandit policy.
All Science Journal Classification (ASJC) codes
- Energy Engineering and Power Technology
- Electrical and Electronic Engineering
- Bayesian inference
- demand response
- index policy
- restless bandit