What Matters in Learning from Large-Scale Datasets for Robot Manipulation

1Georgia Institute of Technology, 2The University of Texas at Austin 3NVIDIA Research
*equal contribution, equal advising

ICLR 2025

Our contributions are three-fold

We build a systematic data collection framework for robot manipulation

We leverage our framework to generate a large-scale simulated robotics dataset with over 850K demos for 3K task instances

MimicLabs Dataset

Available to download from 🤗 Hugging Face

We design a study for data Collectors and Retrievers, formulated using "Diversity" and "Alignment" in robotics datasets

Our study finds where data Collectors should focus efforts for maximal downstream performance boost

And how should users of large-scale robotics datasets retrieve datasets for their tasks

Our takeaways for data retrieval hold in the real-world, on existing large-scale datasets!

More real-robot results - co-training with retrieval from DROID

Below are some videos showing rollouts for different tasks, with different co-training datasets retrieved from the DROID dataset.


Wipe Board

Target only

DROID co-training

Retrieve object

Retrieve campose

Retrieve spatial

Pour Bowl

Target only

DROID co-training

Retrieve object

Retrieve campose

Retrieve spatial

Stack Block

Target only

DROID co-training

Retrieve object

Retrieve campose

Retrieve spatial

Snack

Target only

DROID co-training

Retrieve object

Retrieve campose

Retrieve spatial

We also perform retrieval on the large-scale MimicLabs Dataset

We retrieve datasets to train robot policies for the following target tasks:


Bin carrot

Bin bowl

Clear table

Microwave teapot

Make coffee

We summarize our findings in the table below, that shows success rates on all five target tasks shown above when co-training on different dataset splits in the MimicLabs dataset. Our structured demonstration generation pipeline allows for counterfactual retrival on the absence of the required skill for grasping the required object or accessign the receptacle.

Study Overview

Citation

@inproceedings{
  title={What Matters in Learning from Large-Scale Datasets for Robot Manipulation},
  author={Vaibhav Saxena, Matthew Bronars, Nadun Ranawaka Arachchige, Kuancheng Wang, Woo Chul Shin, Soroush Nasiriany, Ajay Mandlekar, Danfei Xu},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://arxiv.org/pdf/2506.13536}
}