Researches on computer games for Go, Chess, and Japanese Chess stand out as one of the notable landmarks in the progress of artificial intelligence. AlphaGo, AlphaGo Zero, and AlphaZero algorithms, which are called AlphaZero style (AZ-style) algorithms in some literature , have achieved superhuman performance by using deep reinforcement learning (DRL). However, the unavailability of training details, expensive equipment used for model training, and the low evaluation accuracy resulted by slow self-play training without expensive computing equipment in practical applications have been the defects of AZ-style algorithms. To solve the problems to a certain extent, the paper proposes an improved online sequential extreme learning machine (IOS-ELM), a new evaluation method, to evaluate chess board positions for AZ-style algortihm. Firstly, the theoretical principles of IOS-ELM is given. Secondly, the study considers Gomoku as the application object and uses IOS-ELM as the evaluation method for AZ-style's board positions to discuss the loss in the training process and hyperparameters affecting performance in detail. Under the same experimental conditions, the proposed method reduces the training parameters by 14 times, training time to 15%, and error of evaluation by 13% compared with the board evaluation network used in original AZ-style algorithms.
All Science Journal Classification (ASJC) codes
- Computer Science(all)
- Materials Science(all)
- Artificial intelligence
- deep reinforcement learning
- evaluation method
- online sequential extreme learning machine