Dataset Expansion using MimicGen#
Preparing your source demonstrations#
After collecting your demonstrations, run the following script to add subtask information to the dataset:
$ cd <PATH_TO_THIS_REPO>/mimiclabs/mimicgen
$ python scripts/prepare_src_dataset.py --dataset </PATH/TO/DATASET/....hdf5> \
This script defaults to using the MG_MimicLabs MimicGen interface for injecting object-centric information into the demo file, which is subsequently needed for the dataset generation. If you wish to use your own interface, make sure to import it in scripts/prepare_src_dataset.py and pass the interface name using the --env_interface flag.
Creating MimicGen configs and generation jobs#
The MimicGen data generation pipeline requires a set of configurations for each task created by the user as a BDDL. We provide scripts to generate these configs directly from task BDDLs, and subsequently launch data generation jobs. Below is an example:
$ cd <PATH_TO_THIS_REPO>/mimiclabs/mimicgen
$ python scripts/generate_configs_and_jobs.py \
--task_suite_name <YOUR/TASK/SUITE> \
--source_demos_dir <DIR/TO/SOURCE/DEMOS> \
--generation_dir <DIR/TO/GENERATION/DEMOS> \
--num_demos <NUM_DEMOS_PER_TASK>
Please see mimiclabs/mimicgen/scripts/generate_configs_and_jobs.py for additional config parameters that affect the config generation process.
The above script will generate config templates for MimicGen as JSON files under <PATH_TO_THIS_REPO>/mimiclabs/mimicgen/exps/<YOUR/TASK/SUITE>. It also exports a bash script for running all data generation jobs. You can run them in parallel using:
$ ./exps/<YOUR/TASK/SUITE>/jobs.sh
Each data data generation job runs mimiclabs/mimicgen/scripts/generate_dataset.py.
Re-using source demonstrations across a task suite#
To re-use source datasets for new BDDLs when using MimicGen, create a task_suite containing all the new BDDLs, then run:
$ cd <PATH_TO_THIS_REPO>/mimiclabs/mimicgen
$ python scripts/generate_configs_and_jobs.py \
--task_suite_name <YOUR/TASK/SUITE> \
--source_dataset_path <PATH/TO/REUSABLE/SOURCE/DEMOS> \
--generation_dir <DIR/TO/GENERATION/DEMOS> \
--num_demos <NUM_DEMOS_PER_TASK>
where you can provide the source demos that you want to re-use for all tasks in the suite, in the --source_dataset_path arg.