Deep reinforcement learning methods are developed to deal with challenging locomotion control problems in a robotics domain and can achieve significant performance improvement over conventional control methods. One of their appealing advantages is model-free. In other words, agents learn a control policy completely from scratches with raw high-dimensional sensory observations. However, they often suffer from poor sample-efficiency and instability issues, which make them inapplicable to many engineering systems. This paper presents a distributed fusion-based policy search framework to accelerate robot locomotion learning processes through variance reduction and asynchronous exploration approaches. An adaptive fusion-based variance reduction technique is introduced to improve sample-efficiency. A parametric noise is added to neural network weights, which leads to efficient exploration and ensures consistency in actions. Subsequently, the fusion-based policy gradient estimator is extended to a distributed decoupled actor-critic architecture. This allows the central estimator to handle off-policy data from different actors asynchronously, which fully utilizes CPUs and GPUs to maximize data throughput. The aim of this work is to improve sample-efficiency and convergence speed of deep reinforcement learning in robot locomotion tasks. Simulation results are presented to verify the theoretical results, which show that the proposed algorithm achieves and sometimes surpasses the state-of-theart performance.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Artificial Intelligence