Dataset Expansion using MimicGen#

Preparing your source demonstrations#

After collecting your demonstrations, run the following script to add subtask information to the dataset:

$ cd <PATH_TO_THIS_REPO>/mimiclabs/mimicgen
$ python scripts/prepare_src_dataset.py --dataset </PATH/TO/DATASET/....hdf5> \

This script defaults to using the MG_MimicLabs MimicGen interface for injecting object-centric information into the demo file, which is subsequently needed for the dataset generation. If you wish to use your own interface, make sure to import it in scripts/prepare_src_dataset.py and pass the interface name using the --env_interface flag.

Creating MimicGen configs and generation jobs#

The MimicGen data generation pipeline requires a set of configurations for each task created by the user as a BDDL. We provide scripts to generate these configs directly from task BDDLs, and subsequently launch data generation jobs. Below is an example:

$ cd <PATH_TO_THIS_REPO>/mimiclabs/mimicgen
$ python scripts/generate_configs_and_jobs.py \
    --task_suite_name <YOUR/TASK/SUITE> \
    --source_demos_dir <DIR/TO/SOURCE/DEMOS> \
    --generation_dir <DIR/TO/GENERATION/DEMOS> \
    --num_demos <NUM_DEMOS_PER_TASK>

Please see mimiclabs/mimicgen/scripts/generate_configs_and_jobs.py for additional config parameters that affect the config generation process.

The above script will generate config templates for MimicGen as JSON files under <PATH_TO_THIS_REPO>/mimiclabs/mimicgen/exps/<YOUR/TASK/SUITE>. It also exports a bash script for running all data generation jobs. You can run them in parallel using:

$ ./exps/<YOUR/TASK/SUITE>/jobs.sh

Each data data generation job runs mimiclabs/mimicgen/scripts/generate_dataset.py.