TY - JOUR
T1 - Improved online sequential extreme learning machine
T2 - A new intelligent evaluation method for AZ-style algorithms
AU - Li, Xiali
AU - He, Shuai
AU - Wei, Zhi
AU - Wu, Licheng
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 61602539, Grant 61873291, and Grant 61773416, and in part by the Minzu University of China (MUC) 111 Project.
Publisher Copyright:
© 2013 IEEE.
PY - 2019
Y1 - 2019
N2 - Researches on computer games for Go, Chess, and Japanese Chess stand out as one of the notable landmarks in the progress of artificial intelligence. AlphaGo, AlphaGo Zero, and AlphaZero algorithms, which are called AlphaZero style (AZ-style) algorithms in some literature [1], have achieved superhuman performance by using deep reinforcement learning (DRL). However, the unavailability of training details, expensive equipment used for model training, and the low evaluation accuracy resulted by slow self-play training without expensive computing equipment in practical applications have been the defects of AZ-style algorithms. To solve the problems to a certain extent, the paper proposes an improved online sequential extreme learning machine (IOS-ELM), a new evaluation method, to evaluate chess board positions for AZ-style algortihm. Firstly, the theoretical principles of IOS-ELM is given. Secondly, the study considers Gomoku as the application object and uses IOS-ELM as the evaluation method for AZ-style's board positions to discuss the loss in the training process and hyperparameters affecting performance in detail. Under the same experimental conditions, the proposed method reduces the training parameters by 14 times, training time to 15%, and error of evaluation by 13% compared with the board evaluation network used in original AZ-style algorithms.
AB - Researches on computer games for Go, Chess, and Japanese Chess stand out as one of the notable landmarks in the progress of artificial intelligence. AlphaGo, AlphaGo Zero, and AlphaZero algorithms, which are called AlphaZero style (AZ-style) algorithms in some literature [1], have achieved superhuman performance by using deep reinforcement learning (DRL). However, the unavailability of training details, expensive equipment used for model training, and the low evaluation accuracy resulted by slow self-play training without expensive computing equipment in practical applications have been the defects of AZ-style algorithms. To solve the problems to a certain extent, the paper proposes an improved online sequential extreme learning machine (IOS-ELM), a new evaluation method, to evaluate chess board positions for AZ-style algortihm. Firstly, the theoretical principles of IOS-ELM is given. Secondly, the study considers Gomoku as the application object and uses IOS-ELM as the evaluation method for AZ-style's board positions to discuss the loss in the training process and hyperparameters affecting performance in detail. Under the same experimental conditions, the proposed method reduces the training parameters by 14 times, training time to 15%, and error of evaluation by 13% compared with the board evaluation network used in original AZ-style algorithms.
KW - AlphaZero
KW - Artificial intelligence
KW - deep reinforcement learning
KW - evaluation method
KW - online sequential extreme learning machine
UR - http://www.scopus.com/inward/record.url?scp=85078027385&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078027385&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2019.2938568
DO - 10.1109/ACCESS.2019.2938568
M3 - Article
AN - SCOPUS:85078027385
SN - 2169-3536
VL - 7
SP - 124891
EP - 124901
JO - IEEE Access
JF - IEEE Access
M1 - 8821351
ER -