Model-based reinforcement learning methods provide a promising direction for a range of automated applications, such as autonomous vehicles and legged robots, due to their sample-efficiency. However, their asymptotic performance is usually inferior compared to the state-of-the-art model-free reinforcement learning methods in locomotion control domains. One main challenge of model-based reinforcement learning is learning a dynamics model that is accurate enough for planning. This paper mitigates this issue by meta-reinforcement learning from an ensemble of dynamics models. A policy learns from dynamics models that hold different beliefs of a real environment. This procedure improves its adaptability and inaccuracy-tolerance ability. A proximal meta-reinforcement learning algorithm is introduced to improve computational efficiency and reduces variance of higher-order gradient estimation. A heteroscedastic noise is added to the training dataset, thus leading to a robust and efficient model learning. Subsequently, proximal meta-reinforcement learning maximizes the expected returns by sampling 'imaginary' trajectories from the learned dynamics, which does not require real environment data and can be deployed on many servers in parallel to speed up the whole learning process. The aim of this work is to reduce the sample-complexity and computational cost of reinforcement learning in robot locomotion tasks. Simulation experiments show that the proposed algorithm achieves an asymptotic performance compared with the state-of-the-art model-free reinforcement learning methods with significantly fewer samples, which confirm our theoretical results.