TY - JOUR
T1 - A Reinforcement Learning Model Based on Temporal Difference Algorithm
AU - Li, Xiali
AU - Lv, Zhengyu
AU - Wang, Song
AU - Wei, Zhi
AU - Wu, Licheng
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 61602539, Grant 61873291, and Grant 61773416, and in part by the Minzu University of China (MUC) 111 Project.
Publisher Copyright:
© 2013 IEEE.
PY - 2019
Y1 - 2019
N2 - In some sense, computer game can be used as a test bed of artificial intelligence to develop intelligent algorithms. The paper proposed a kind of intelligent method: a reinforcement learning model based on temporal difference (TD) algorithm. And then the method is used to improve the playing power of the computer game of a special kind of chess. JIU chess, also called Tibetan Go chess, is mainly played in places where Tibetan tribes gather. Its play process is divided two sequential stages: preparation and battle. The layout at preparation is vital for the successive battle, even for the final winning. Studies on Tibetan JIU chess have focused on Bayesian network based pattern extraction and chess shape based strategy, which do not perform well. To address the low chess power of JIU chess from the view of artificial intelligence, we developed a reinforcement learning model based on temporal difference (TD) algorithm for the preparation stage of JIU. First, the search range was limited within a 6 × 6 area at the center of the chessboard, and the TD learning architecture was combined with chess shapes to construct an intelligent environmental feedback system. Second, optimal state transition strategies were obtained by self-play. In addition, the results of the reinforcement learning model were output as SGF files, which act as a pattern library for the battle stage. The experimental results demonstrate that this reinforcement learning model can effectively improve the playing strength of JIU program and outperform the other methods.
AB - In some sense, computer game can be used as a test bed of artificial intelligence to develop intelligent algorithms. The paper proposed a kind of intelligent method: a reinforcement learning model based on temporal difference (TD) algorithm. And then the method is used to improve the playing power of the computer game of a special kind of chess. JIU chess, also called Tibetan Go chess, is mainly played in places where Tibetan tribes gather. Its play process is divided two sequential stages: preparation and battle. The layout at preparation is vital for the successive battle, even for the final winning. Studies on Tibetan JIU chess have focused on Bayesian network based pattern extraction and chess shape based strategy, which do not perform well. To address the low chess power of JIU chess from the view of artificial intelligence, we developed a reinforcement learning model based on temporal difference (TD) algorithm for the preparation stage of JIU. First, the search range was limited within a 6 × 6 area at the center of the chessboard, and the TD learning architecture was combined with chess shapes to construct an intelligent environmental feedback system. Second, optimal state transition strategies were obtained by self-play. In addition, the results of the reinforcement learning model were output as SGF files, which act as a pattern library for the battle stage. The experimental results demonstrate that this reinforcement learning model can effectively improve the playing strength of JIU program and outperform the other methods.
KW - Artificial intelligence
KW - JIU chess
KW - reinforcement learning
KW - temporal difference algorithm
UR - http://www.scopus.com/inward/record.url?scp=85085491035&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085491035&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2019.2938240
DO - 10.1109/ACCESS.2019.2938240
M3 - Article
AN - SCOPUS:85085491035
SN - 2169-3536
VL - 7
SP - 121922
EP - 121930
JO - IEEE Access
JF - IEEE Access
M1 - 8819952
ER -