Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions

Ale Escontrela
Jason Peng
Ken Goldberg
Pieter Abbeel
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2022) (to appear)
Google Scholar

Abstract

Training high-dimensional simulated agents with under-specified reward functions often leads to jerky and unnatural behaviors, which results in physically infeasible strategies that are generally ineffective when deployed in the real world. To mitigate these unnatural behaviors, reinforcement learning (RL) practitioners often utilize complex reward functions that encourage more physically plausible behaviors, in conjunction with tricks such as domain randomization to train policies that satisfy the user's style criteria and can be successfully deployed on real robots. Such an approach has been successful in the realm of legged locomotion, leading to state-of-the-art results. However, designing effective reward functions can be a labour-intensive and tedious tuning process, and these hand-designed rewards do not easily generalize across platforms and tasks. We propose substituting complex reward functions with "style rewards" learned from a dataset of motion capture demonstrations. This learned style reward can be combined with a simple task reward to train policies that perform tasks using naturalistic strategies. These more natural strategies can also facilitate transfer to the real world. We build upon prior work in computer graphics and demonstrate that an adversarial approach to training control policies can produce behaviors that transfer to a real quadrupedal robot without requiring complex reward functions. We also demonstrate that an effective style reward can be learned from a few seconds of motion capture data gathered from a German Shepherd and leads to energy-efficient locomotion strategies with natural gait transitions.

Research Areas