Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training

Scroll for more! ⬇️

Abstract

Behavior cloning has shown promise for robot manipulation, but real-world demonstrations are costly to acquire at scale. While simulated data offers a scalable alternative, particularly with advances in automated demonstration generation, transferring policies to the real world is hampered by various simulation and real domain gaps. In this work, we propose a unified sim-and-real co-training framework for learning generalizable manipulation policies that primarily leverages simulation and only requires a few real-world demonstrations. Central to our approach is learning a domain-invariant, task-relevant feature space. Our key insight is that aligning the joint distributions of observations and their corresponding actions across domains provides a richer signal than aligning observations (marginals) alone. We achieve this by embedding an Optimal Transport (OT)-inspired loss within the co-training framework, and extend this to an Unbalanced OT framework to handle the imbalance between abundant simulation data and limited real-world examples. We validate our method on challenging manipulation tasks, showing it can leverage abundant simulation data to achieve up to a 30% improvement in the real-world success rate and even generalize to scenarios seen only in simulation.

Simulation Experiments

Simulation Task Setups: For each task, we generate 1000 demos in Source domain reset region and only 10 demos in Target domain In-Distribution (ID) reset region.

sim setup

Our method learns complex manipulation tasks under significant domain shifts using fewer than 10 demonstrations from the target domains.

Ours: 0.77

Co-training: 0.73

MMD: 0.54

Target-only: 0.51

Ours: 0.80

Co-training: 0.70

MMD: 0.46

Target-only: 0.44

Our method is compatible with both image and point cloud observation modalities.

Ours: 0.68

Co-training: 0.62

MMD: 0.46

Target-only: 0.38

Our method generalizes more effectively to Out-Of-Distribution (OOD) scenarios in target domains.

Ours: 0.63

Co-training: 0.51

MMD: 0.48

Target-only: 0.00

Ours: 0.11

Co-training: 0.06

MMD: 0.07

Target-only: 0.00

Ours: 0.59

Co-training: 0.47

MMD: 0.40

Target-only: 0.00

Real-world Experiments

Real Task Setups: We collect 10–25 demos per task in real world In-Distribution (ID) reset region, varying with task difficulty. Besides evaluation on reset range OOD, we additionally test shape and texture OOD scenarios.

real setup

Real Data Collection: Our hardware platform uses a Franka Emika Panda robot, with an Intel RealSense D435 camera for capturing image and depth, and a Meta Quest 3 headset for teleoperation.

hardware

Demo Visualization: We display the first frames of collected demos to illustrate the task setup and data distribution.

Ours rollouts for In-Distribution (ID) and Out-Of-Distribution (OOD) evaluation

Co-training baseline failure modes for Out-Of-Distribution (OOD) evaluation

BibTeX

placeholder 7