Style-Augmented Mutual Information for Practical Skill Discovery

Ale Escontrela
Jason Peng
Ken Goldberg
Pieter Abbeel
Proceedings of NeurIPS (2022) (to appear)
Google Scholar

Abstract

Exploration and skill discovery in many real-world settings is often inspired by the activities we see others perform. However, most unsupervised skill discovery methods tend to focus solely on the intrinsic component of motivation, often by maximizing the Mutual Information (MI) between the agent's skills and the observed trajectories. These skills, though diverse in the behaviors they elicit, leave much to be desired. Namely, skills learned by maximizing MI in a high-dimensional continuous control setting tend to be aesthetically unpleasing and challenging to utilize in a practical setting, as the violent behavior often exhibited by these skills would not transfer well to the real world. We argue that solely maximizing MI is insufficient if we wish to discover useful skills, and that a notion of "style" must be incorporated into the objective. To this end, we propose the Style-Augmented Mutual Information objective (SAMI), whereby - in addition to maximizing a lower-bound of the MI - the agent is encouraged to minimize the f-divergence between the policy-induced trajectory distribution and the trajectory distribution contained in the reference data (the style objective). We compare SAMI to other popular skill discovery objectives, and demonstrate that skill-conditioned policies optimized with SAMI achieve equal or greater performance when applied to downstream tasks. We also show that the data-driven motion prior specified by the style objective can be inferred from various modalities, including large motion capture datasets or even RGB videos.

Research Areas