TY - JOUR
T1 - Fast and Epsilon-Optimal Discretized Pursuit Learning Automata
AU - Zhang, Junqi
AU - Wang, Cheng
AU - Zhou, Mengchu
N1 - Funding Information:
This work was supported in part by the China National Science Foundation under Grant 61272271, Grant 61202383, and Grant 61332008, in part by the Shanghai NSF under Grant 12ZR1434000, in part by the U.S. NSF under Grant CMMI-1162482, and in part by the National Basic Research Program of China under Grant 2014CB340404.
Funding Information:
Manuscript received August 21, 2013; revised March 30, 2014 and October 14, 2014; accepted October 23, 2014. Date of publication November 13, 2014; date of current version September 14, 2015. This work was supported in part by the China National Science Foundation under Grant 61272271, Grant 61202383, and Grant 61332008, in part by the Shanghai NSF under Grant 12ZR1434000, in part by the U.S. NSF under Grant CMMI-1162482, and in part by the National Basic Research Program of China under Grant 2014CB340404. This paper was recommended by Associate Editor P. S. Sastry. (Corresponding author: M. Zhou.) J. Zhang, C. Wang and M. Zhou are with the Department of Computer Science and Technology, Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai 200092, China (e-mail: zhangjunqi@tongji.edu.cn; chengwang@tongji.edu.cn).
Publisher Copyright:
© 2014 IEEE.
PY - 2015/10
Y1 - 2015/10
N2 - Learning automata (LA) are powerful tools for reinforcement learning. A discretized pursuit LA is the most popular one among them. During an iteration its operation consists of three basic phases: 1) selecting the next action; 2) finding the optimal estimated action; and 3) updating the state probability. However, when the number of actions is large, the learning becomes extremely slow because there are too many updates to be made at each iteration. The increased updates are mostly from phases 1 and 3. A new fast discretized pursuit LA with assured ϵ-optimality is proposed to perform both phases 1 and 3 with the computational complexity independent of the number of actions. Apart from its low computational complexity, it achieves faster convergence speed than the classical one when operating in stationary environments. This paper can promote the applications of LA toward the large-scale-action oriented area that requires efficient reinforcement learning tools with assured ϵ-optimality, fast convergence speed, and low computational complexity for each iteration.
AB - Learning automata (LA) are powerful tools for reinforcement learning. A discretized pursuit LA is the most popular one among them. During an iteration its operation consists of three basic phases: 1) selecting the next action; 2) finding the optimal estimated action; and 3) updating the state probability. However, when the number of actions is large, the learning becomes extremely slow because there are too many updates to be made at each iteration. The increased updates are mostly from phases 1 and 3. A new fast discretized pursuit LA with assured ϵ-optimality is proposed to perform both phases 1 and 3 with the computational complexity independent of the number of actions. Apart from its low computational complexity, it achieves faster convergence speed than the classical one when operating in stationary environments. This paper can promote the applications of LA toward the large-scale-action oriented area that requires efficient reinforcement learning tools with assured ϵ-optimality, fast convergence speed, and low computational complexity for each iteration.
KW - Discretized pursuit learning automata (LA)
KW - Stationary environments
KW - low computational complexity
UR - http://www.scopus.com/inward/record.url?scp=84911071747&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84911071747&partnerID=8YFLogxK
U2 - 10.1109/TCYB.2014.2365463
DO - 10.1109/TCYB.2014.2365463
M3 - Article
AN - SCOPUS:84911071747
SN - 2168-2267
VL - 45
SP - 2089
EP - 2099
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
IS - 10
M1 - 6955789
ER -