Policy Post-Training#

This workflow covers post-training an example policy using the generated dataset, here we use GR00T N1.6 as the base model.

Docker Container: Base + GR00T (see Docker Containers for more details)

./docker/run_docker.sh -g

Once inside the container, set the dataset and models directories.

export DATASET_DIR=/datasets/isaaclab_arena/locomanipulation_tutorial
export MODELS_DIR=/models/isaaclab_arena/locomanipulation_tutorial

Note that this tutorial assumes that you’ve completed the preceding step (Data Generation) or downloaded the pre-generated dataset.

Step 1: Convert to LeRobot Format#

GR00T N1.6 requires the dataset to be in LeRobot format. We provide a script to convert from the IsaacLab Mimic generated HDF5 dataset to LeRobot format. Note that this conversion step can be skipped by downloading the pre-converted LeRobot format dataset.

Convert the HDF5 dataset to LeRobot format for policy post-training:

python isaaclab_arena_gr00t/lerobot/convert_hdf5_to_lerobot.py \
  --yaml_file isaaclab_arena_gr00t/lerobot/config/g1_locomanip_config.yaml

This creates a folder $DATASET_DIR/arena_g1_loco_manipulation_dataset_generated/lerobot containing parquet files with states/actions, MP4 camera recordings, and dataset metadata.

The converter is controlled by a config file at isaaclab_arena_gr00t/lerobot/config/g1_locomanip_config.yaml.

Step 2: Post-train Policy#

We post-train the GR00T N1.6 policy on the task.

The GR00T N1.6 policy has 3 billion parameters so post-training is an an expensive operation. We provide one post-training option, 8 GPUs with 48GB memory, to achieve the best quality:

Training takes approximately 4-8 hours on 8x L40s GPUs.

Training Configuration:

Base Model: GR00T-N1.6-3B (foundation model)
Tuned Modules: Visual backbone, projector, diffusion model
Frozen Modules: LLM (language model)
Batch Size: 96 (adjust based on GPU memory)
Training Steps: 20,000
GPUs: 8 (multi-GPU training)

To post-train the policy, run the following command

python -m torch.distributed.run --nproc_per_node=8 --standalone submodules/Isaac-GR00T/gr00t/experiment/launch_finetune.py \
--dataset_path=$DATASET_DIR/arena_g1_loco_manipulation_dataset_generated/lerobot \
--output_dir=$MODELS_DIR \
--modality_config_path=isaaclab_arena_gr00t/embodiments/g1/g1_sim_wbc_data_config.py \
--global_batch_size=96 \
--max_steps=20000 \
--num_gpus=8 \
--save_steps=5000 \
--save_total_limit=5 \
--base_model_path=nvidia/GR00T-N1.6-3B \
--no_tune_llm \
--tune_visual \
--tune_projector \
--tune_diffusion_model \
--dataloader_num_workers=16 \
--use-wandb \
--color_jitter_params brightness 0.3 contrast 0.4 saturation 0.5 hue 0.08 \
--embodiment_tag=NEW_EMBODIMENT

If you have less powerful GPUs, please see the GR00T fine-tuning guidelines for information on how to adjust the training configuration to your hardware, to achieve the best results. We recommend fine-tuning the visual backbone, projector, and diffusion model for better results.