What Matters in Learning from Large-Scale Datasets for Robot Manipulation

Vaibhav Saxena¹, Matthew Bronars^*1, Nadun Ranawaka Arachchige^*1, Kuancheng Wang¹, Woo Chul Shin¹, Soroush Nasiriany², Ajay Mandlekar^†3 Danfei Xu^†1,3

¹Georgia Institute of Technology, ²The University of Texas at Austin ³NVIDIA Research

^*equal contribution, ^†equal advising

ICLR 2025

Paper arXiv Code Data Documentation

We leverage our framework to generate a large-scale simulated robotics dataset with over 850K demos for 3K task instances

MimicLabs Dataset

Available to download from 🤗 Hugging Face

**We design a study for data Collectors and Retrievers, formulated using "Diversity" and "Alignment" in robotics datasets**

More real-robot results - co-training with retrieval from DROID

Below are some videos showing rollouts for different tasks, with different co-training datasets retrieved from the DROID dataset.

Wipe Board

Target only

DROID co-training

Retrieve object

Retrieve campose

Retrieve spatial

Pour Bowl

Target only

DROID co-training

Retrieve object

Retrieve campose

Retrieve spatial

Stack Block

Target only

DROID co-training

Retrieve object

Retrieve campose

Retrieve spatial

Snack

Target only

DROID co-training

Retrieve object

Retrieve campose

Retrieve spatial

We retrieve datasets to train robot policies for the following target tasks:

Bin carrot

Bin bowl

Clear table

Microwave teapot

Make coffee

We summarize our findings in the table below, that shows success rates on all five target tasks shown above when co-training on different dataset splits in the MimicLabs dataset. Our structured demonstration generation pipeline allows for counterfactual retrival on the absence of the required skill for grasping the required object or accessign the receptacle.

Citation

@inproceedings{
  title={What Matters in Learning from Large-Scale Datasets for Robot Manipulation},
  author={Vaibhav Saxena and Matthew Bronars and Nadun Ranawaka Arachchige and Kuancheng Wang and Woo Chul Shin and Soroush Nasiriany and Ajay Mandlekar and Danfei Xu},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://arxiv.org/pdf/2506.13536}
}

What Matters in Learning from Large-Scale Datasets for Robot Manipulation

Our contributions are three-fold

We build a systematic data collection framework for robot manipulation

We leverage our framework to generate a large-scale simulated robotics dataset with over 850K demos for 3K task instances

MimicLabs Dataset

Available to download from 🤗 Hugging Face

We design a study for data Collectors and Retrievers, formulated using "Diversity" and "Alignment" in robotics datasets

Our study finds where data Collectors should focus efforts for maximal downstream performance boost

And how should users of large-scale robotics datasets retrieve datasets for their tasks

Our takeaways for data retrieval hold in the real-world, on existing large-scale datasets!

More real-robot results - co-training with retrieval from DROID

Wipe Board

Target only

DROID co-training

Retrieve object

Retrieve campose

Retrieve spatial

Pour Bowl

Target only

DROID co-training

Retrieve object

Retrieve campose

Retrieve spatial

Stack Block

Target only

DROID co-training

Retrieve object

Retrieve campose

Retrieve spatial

Snack

Target only

DROID co-training

Retrieve object

Retrieve campose

Retrieve spatial

We also perform retrieval on the large-scale MimicLabs Dataset

Bin carrot

Bin bowl

Clear table

Microwave teapot

Make coffee

Study Overview

Citation

**We design a study for data Collectors and Retrievers, formulated using "Diversity" and "Alignment" in robotics datasets**

**Our study finds where data Collectors should focus efforts for maximal downstream performance boost**

**And how should users of large-scale robotics datasets retrieve datasets for their tasks**