TY - GEN
T1 - Learning locomotion skills via model-based proximal meta-reinforcement learning
AU - Xiao, Qing
AU - Cao, Zhengcai
AU - Zhou, Mengchu
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Model-based reinforcement learning methods provide a promising direction for a range of automated applications, such as autonomous vehicles and legged robots, due to their sample-efficiency. However, their asymptotic performance is usually inferior compared to the state-of-the-art model-free reinforcement learning methods in locomotion control domains. One main challenge of model-based reinforcement learning is learning a dynamics model that is accurate enough for planning. This paper mitigates this issue by meta-reinforcement learning from an ensemble of dynamics models. A policy learns from dynamics models that hold different beliefs of a real environment. This procedure improves its adaptability and inaccuracy-tolerance ability. A proximal meta-reinforcement learning algorithm is introduced to improve computational efficiency and reduces variance of higher-order gradient estimation. A heteroscedastic noise is added to the training dataset, thus leading to a robust and efficient model learning. Subsequently, proximal meta-reinforcement learning maximizes the expected returns by sampling 'imaginary' trajectories from the learned dynamics, which does not require real environment data and can be deployed on many servers in parallel to speed up the whole learning process. The aim of this work is to reduce the sample-complexity and computational cost of reinforcement learning in robot locomotion tasks. Simulation experiments show that the proposed algorithm achieves an asymptotic performance compared with the state-of-the-art model-free reinforcement learning methods with significantly fewer samples, which confirm our theoretical results.
AB - Model-based reinforcement learning methods provide a promising direction for a range of automated applications, such as autonomous vehicles and legged robots, due to their sample-efficiency. However, their asymptotic performance is usually inferior compared to the state-of-the-art model-free reinforcement learning methods in locomotion control domains. One main challenge of model-based reinforcement learning is learning a dynamics model that is accurate enough for planning. This paper mitigates this issue by meta-reinforcement learning from an ensemble of dynamics models. A policy learns from dynamics models that hold different beliefs of a real environment. This procedure improves its adaptability and inaccuracy-tolerance ability. A proximal meta-reinforcement learning algorithm is introduced to improve computational efficiency and reduces variance of higher-order gradient estimation. A heteroscedastic noise is added to the training dataset, thus leading to a robust and efficient model learning. Subsequently, proximal meta-reinforcement learning maximizes the expected returns by sampling 'imaginary' trajectories from the learned dynamics, which does not require real environment data and can be deployed on many servers in parallel to speed up the whole learning process. The aim of this work is to reduce the sample-complexity and computational cost of reinforcement learning in robot locomotion tasks. Simulation experiments show that the proposed algorithm achieves an asymptotic performance compared with the state-of-the-art model-free reinforcement learning methods with significantly fewer samples, which confirm our theoretical results.
UR - http://www.scopus.com/inward/record.url?scp=85076739399&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076739399&partnerID=8YFLogxK
U2 - 10.1109/SMC.2019.8914406
DO - 10.1109/SMC.2019.8914406
M3 - Conference contribution
AN - SCOPUS:85076739399
T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
SP - 1545
EP - 1550
BT - 2019 IEEE International Conference on Systems, Man and Cybernetics, SMC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Systems, Man and Cybernetics, SMC 2019
Y2 - 6 October 2019 through 9 October 2019
ER -