TY - JOUR
T1 - Distributed Fusion-Based Policy Search for Fast Robot Locomotion Learning
AU - Cao, Zhengcai
AU - Xiao, Qing
AU - Zhou, Mengchu
N1 - Funding Information:
This work is supported by the National Key R&D Program of China (2018YFB1304600), the National Natural Science Foundation of China (Grant No.51575034, U1813220), Beijing Advanced Innovation Center for Intelligent Robots and Systems (Grant No.2018IRS03).
Publisher Copyright:
© 2005-2012 IEEE.
PY - 2019/8
Y1 - 2019/8
N2 - Deep reinforcement learning methods are developed to deal with challenging locomotion control problems in a robotics domain and can achieve significant performance improvement over conventional control methods. One of their appealing advantages is model-free. In other words, agents learn a control policy completely from scratches with raw high-dimensional sensory observations. However, they often suffer from poor sample-efficiency and instability issues, which make them inapplicable to many engineering systems. This paper presents a distributed fusion-based policy search framework to accelerate robot locomotion learning processes through variance reduction and asynchronous exploration approaches. An adaptive fusion-based variance reduction technique is introduced to improve sample-efficiency. A parametric noise is added to neural network weights, which leads to efficient exploration and ensures consistency in actions. Subsequently, the fusion-based policy gradient estimator is extended to a distributed decoupled actor-critic architecture. This allows the central estimator to handle off-policy data from different actors asynchronously, which fully utilizes CPUs and GPUs to maximize data throughput. The aim of this work is to improve sample-efficiency and convergence speed of deep reinforcement learning in robot locomotion tasks. Simulation results are presented to verify the theoretical results, which show that the proposed algorithm achieves and sometimes surpasses the state-of-theart performance.
AB - Deep reinforcement learning methods are developed to deal with challenging locomotion control problems in a robotics domain and can achieve significant performance improvement over conventional control methods. One of their appealing advantages is model-free. In other words, agents learn a control policy completely from scratches with raw high-dimensional sensory observations. However, they often suffer from poor sample-efficiency and instability issues, which make them inapplicable to many engineering systems. This paper presents a distributed fusion-based policy search framework to accelerate robot locomotion learning processes through variance reduction and asynchronous exploration approaches. An adaptive fusion-based variance reduction technique is introduced to improve sample-efficiency. A parametric noise is added to neural network weights, which leads to efficient exploration and ensures consistency in actions. Subsequently, the fusion-based policy gradient estimator is extended to a distributed decoupled actor-critic architecture. This allows the central estimator to handle off-policy data from different actors asynchronously, which fully utilizes CPUs and GPUs to maximize data throughput. The aim of this work is to improve sample-efficiency and convergence speed of deep reinforcement learning in robot locomotion tasks. Simulation results are presented to verify the theoretical results, which show that the proposed algorithm achieves and sometimes surpasses the state-of-theart performance.
UR - http://www.scopus.com/inward/record.url?scp=85069782804&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069782804&partnerID=8YFLogxK
U2 - 10.1109/MCI.2019.2919364
DO - 10.1109/MCI.2019.2919364
M3 - Article
AN - SCOPUS:85069782804
SN - 1556-603X
VL - 14
SP - 19
EP - 28
JO - IEEE Computational Intelligence Magazine
JF - IEEE Computational Intelligence Magazine
IS - 3
M1 - 8765428
ER -