Abstract
A learning automaton (LA) is a powerful tool for reinforcement learning. Its action probability vector plays two roles: 1) deciding when it converges, i.e., total computing budget it has used, and 2) allocating computing budget among actions to identify the optimal one. These two intertwined roles lead to a problem: the computing budget mostly goes to the currently estimated optimal action due to its high action probability regardless whether such budget allocation can help identify the true optimal one or not. This work proposes a new class of LA that avoids the use of its action probability vector for computing budget allocation. Instead we use such vector only to determine if it converges and then employ optimal computing budget allocation to accomplish the allocation of computing budget in a way that maximizes the probability of identifying the true optimal actions. ϵ-optimality is proven. Simulations verify its advantages over existing algorithms.
Original language | English (US) |
---|---|
Article number | 7165689 |
Pages (from-to) | 1008-1017 |
Number of pages | 10 |
Journal | IEEE Transactions on Automation Science and Engineering |
Volume | 13 |
Issue number | 2 |
DOIs | |
State | Published - Apr 2016 |
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Electrical and Electronic Engineering
Keywords
- Learning automata (LA)
- optimal computing budget allocation (OCBA)
- ordinal optimization