Having a machine learning agent interact with its environment requires true unsupervised learning, skill acquisition, active learning, exploration and reinforcement, all ingredients of human learning that are still not well understood or exploited through the supervised approaches that dominate deep learning today. Our goal is to improve robotics via machine learning, and improve machine learning via robotics. We foster close collaborations between machine learning researchers and roboticists to enable learning at scale on real and simulated robotic systems.
Recent Publications
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents
Jeongeun Park
Seungwon Lim
Joonhyung Lee
Sangbeom Park
Sungjoon Choi
Youngjae Yu
IEEE Robotics and Automation Letters (2023) (to appear)
In this paper, we focus on inferring whether the given user command is clear, ambiguous, or infeasible in the context of interactive robotic agents utilizing large language models (LLMs). To tackle this problem, we first present an uncertainty estimation method for LLMs to classify whether the command is certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command is classified as uncertain, we further distinguish it between ambiguous or infeasible commands leveraging LLMs with situational aware few-shot prompting in a zero-shot manner. For ambiguous commands, we further disambiguate the command by interacting with users via question generation with LLMs. We believe that proper recognition of the given commands could lead to a decrease in malfunction and undesired actions of the robot, enhancing the reliability of interactive robot agents. To evaluate the proposed system, we present a dataset consisting pair of high-level commands, scene descriptions, and labels of command type (i.e., clear, ambiguous, or infeasible). We validate the proposed method on the collected dataset, pick-and-place tabletop simulation. Furthermore, we demonstrate the approach in a real-world human-robot interaction environment, i.e., handover scenarios.
Scalable Multi-Sensor Robot Imitation Learning via Task-Level Domain Consistency
Armando Fuentes
Daniel Ho
Eric Victor Jang
Matt Bennice
Mohi Khansari
Nicolas Sievers
Yuqing Du
ICRA (2023) (to appear)
Recent work in visual end-to-end learning for robotics has shown the promise of imitation learning across a variety of tasks. However, such approaches are often expensive and require vast amounts of real world training demonstrations. Additionally, they rely on a time-consuming evaluation process for identifying the best model to deploy in the real world. These challenges can be mitigated by simulation - by supplementing real world data with simulated demonstrations and using simulated evaluations to identify strong policies. However, this introduces the well-known ``reality gap'' problem, where simulator inaccuracies decorrelates performance in simulation from reality. In this paper, we build on top of prior work in GAN-based domain adaptation and introduce the notion of a Task Consistency Loss (TCL), a self-supervised contrastive loss that encourages sim and real alignment both at the feature and action-prediction level. We demonstrate the effectiveness of our approach on the challenging task of latched-door opening with a 9 Degree-of-Freedom (DoF) mobile manipulator from raw RGB and depth images. While most prior work in vision-based manipulation operate from a fixed, third person view, mobile manipulation couples the challenges of locomotion and manipulation with greater visual diversity and action space complexity. We find that we are able to achieve 77% success on seen and unseen scenes, a +30% increase from the baseline, using only ~16 hours of teleoperation demonstrations in sim and real.
A Connection between Actor Regularization and Critic Regularization in Reinforcement Learning
Benjamin Eysenbach
Matthieu Geist
Ruslan Salakhutdinov
Sergey Levine
International Conference on Machine Learning (ICML) (2023)
As with any machine learning problem with limited data, effective offline RL
algorithms require careful regularization to avoid overfitting, with most methods
regularizing either the actor or the critic. These methods appear distinct. Actor
regularization (e.g., behavioral cloning penalties) is simpler and has appealing
convergence properties, while critic regularization typically requires significantly
more compute because it involves solving a game, but it has appealing lower-bound
guarantees. Empirically, prior work alternates between claiming better results with
actor regularization and critic regularization. In this paper, we show that these two
regularization techniques can be equivalent under some assumptions: regularizing
the critic using a CQL-like objective is equivalent to updating the actor with a BC-
like regularizer and with a SARSA Q-value (i.e., “1-step RL”). Our experiments
show that this theoretical model makes accurate, testable predictions about the
performance of CQL and one-step RL. While our results do not definitively say
whether users should prefer actor regularization or critic regularization, our results
hint that actor regularization methods may be a simpler way to achieve the desirable
properties of critic regularization. The results also suggest that the empirically-
demonstrated benefits of both types of regularization might be more a function of
implementation details rather than objective superiority.
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Jesse Zhang
Jiahui Zhang
Karl Pertsch
Ziyi Liu
Xiang Ren
Shao-Hua Sun
Joseph Lim
Conference on Robot Learning 2023 (2023)
We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by autonomously growing a learned skill library. Prior work in reinforcement learning require expert supervision, in the form of demonstrations or rich reward functions, to learn long-horizon tasks. Instead, our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing “skill bootstrapping,” where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set. This bootstrapping phase is guided by large language models (LLMs) that inform the agent of meaningful skills to chain together. Through this process, BOSS builds a wide range of complex and useful behaviors from a basic set of primitive skills. We demonstrate through experiments in realistic household environments that agents trained with our LLM-guided bootstrapping procedure outperform those trained with naive bootstrapping as well as prior unsupervised skill acquisition methods on zero-shot execution of unseen, long-horizon tasks in new environments
Agile Catching with Whole-Body MPC and Blackbox Policy Learning
Saminda Abeyruwan
Nick Boffi
Anish Shankar
Jean-Jacques Slotine
Stephen Tu
Learning for Dynamics and Control (2023)
We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a
challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization. We provide insights into various performance tradeoffs including sample efficiency, sim-to-real transfer, robustness to distribution shifts, and wholebody multimodality via extensive on-hardware experiments. We conclude with proposals on fusing “classical” and “learning-based” techniques for agile robot control. Videos of our experiments may be found here:
Robotic Table Tennis: A Case Study into a High Speed Learning System
Jon Abelian
Saminda Abeyruwan
Michael Ahn
Justin Boyd
Erwin Johan Coumans
Omar Escareno
Wenbo Gao
Navdeep Jaitly
Juhana Kangaspunta
Satoshi Kataoka
Gus Kouretas
Yuheng Kuang
Corey Lynch
Thinh Nguyen
Ken Oslund
Barney J. Reed
Anish Shankar
Avi Singh
Grace Vesom
Peng Xu
Robotics: Science and Systems (2023)
We present a deep-dive into a learning robotic system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized and novel perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description including numerous design decisions that are typically not widely disseminated, with a collection of ablation studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, and sensitivity to policy hyper-parameters and choice of action space. A video demonstrating the components of our system and details of experimental results is included in the supplementary material.
