TY - JOUR
T1 - Incorporation of Optimal Computing Budget Allocation for Ordinal Optimization into Learning Automata
AU - Zhang, Junqi
AU - Wang, Cheng
AU - Zang, Di
AU - Zhou, Mengchu
N1 - Funding Information:
Manuscript received January 05, 2015; revised March 20, 2015, and May 20, 2015; accepted May 29, 2015. Date of publication July 23, 2015; date of current version April 05, 2016. This paper was recommended for publication by Associate Editor D. Down and Editor S. Reveliotis upon evaluation of the reviewers' comments. This work was supported in part by China National Science Foundation under Grant 61272271, Grant 61202383, and Grant 61332008, Shanghai National Science Foundation under Grant 12ZR1434000 and Grant 12ZR1451200, the National Science Foundation under Grant CMMI-1162482, the National Basic Research Program of China under Grant 2014CB340404, and the Shanghai Rising-Star Program under Grant 14QA1403700.
Publisher Copyright:
© 2004-2012 IEEE.
PY - 2016/4
Y1 - 2016/4
N2 - A learning automaton (LA) is a powerful tool for reinforcement learning. Its action probability vector plays two roles: 1) deciding when it converges, i.e., total computing budget it has used, and 2) allocating computing budget among actions to identify the optimal one. These two intertwined roles lead to a problem: the computing budget mostly goes to the currently estimated optimal action due to its high action probability regardless whether such budget allocation can help identify the true optimal one or not. This work proposes a new class of LA that avoids the use of its action probability vector for computing budget allocation. Instead we use such vector only to determine if it converges and then employ optimal computing budget allocation to accomplish the allocation of computing budget in a way that maximizes the probability of identifying the true optimal actions. ϵ-optimality is proven. Simulations verify its advantages over existing algorithms.
AB - A learning automaton (LA) is a powerful tool for reinforcement learning. Its action probability vector plays two roles: 1) deciding when it converges, i.e., total computing budget it has used, and 2) allocating computing budget among actions to identify the optimal one. These two intertwined roles lead to a problem: the computing budget mostly goes to the currently estimated optimal action due to its high action probability regardless whether such budget allocation can help identify the true optimal one or not. This work proposes a new class of LA that avoids the use of its action probability vector for computing budget allocation. Instead we use such vector only to determine if it converges and then employ optimal computing budget allocation to accomplish the allocation of computing budget in a way that maximizes the probability of identifying the true optimal actions. ϵ-optimality is proven. Simulations verify its advantages over existing algorithms.
KW - Learning automata (LA)
KW - optimal computing budget allocation (OCBA)
KW - ordinal optimization
UR - http://www.scopus.com/inward/record.url?scp=84938505448&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84938505448&partnerID=8YFLogxK
U2 - 10.1109/TASE.2015.2450535
DO - 10.1109/TASE.2015.2450535
M3 - Article
AN - SCOPUS:84938505448
SN - 1545-5955
VL - 13
SP - 1008
EP - 1017
JO - IEEE Transactions on Automation Science and Engineering
JF - IEEE Transactions on Automation Science and Engineering
IS - 2
M1 - 7165689
ER -