Policy Post-Training#
This workflow covers post-training an example policy using the generated dataset, here we use GR00T N1.5 as the base model.
Docker Container: Base + GR00T (see Docker Containers for more details)
./docker/run_docker.sh -g
Once inside the container, set the dataset and models directories.
export DATASET_DIR=/datasets/isaaclab_arena/locomanipulation_tutorial
export MODELS_DIR=/models/isaaclab_arena/locomanipulation_tutorial
Note
The GR00T N1.5 codebase does not support running on Blackwell architecture by default. There are instructions here to building certain packages from source to support running on these architectures. We have not tested these instructions, and therefore we do not recommend using the Base + GR00T container for policy post-training and evaluation on Blackwell architecture, like RTX 50 series, RTX Pro 6000 or DGX Spark.
Note that this tutorial assumes that you’ve completed the preceding step (Data Generation) or downloaded the pre-generated dataset.
Download Pre-generated Dataset (skip preceding steps)
These commands can be used to download the mimic-generated HDF5 dataset ready for policy post-training, such that the preceding steps can be skipped.
To download run:
hf download \
nvidia/Arena-G1-Loco-Manipulation-Task \
arena_g1_loco_manipulation_dataset_generated.hdf5 \
--repo-type dataset \
--local-dir $DATASET_DIR
Step 1: Convert to LeRobot Format#
GR00T N1.5 requires the dataset to be in LeRobot format. We provide a script to convert from the IsaacLab Mimic generated HDF5 dataset to LeRobot format. Note that this conversion step can be skipped by downloading the pre-converted LeRobot format dataset.
Download Pre-converted LeRobot Dataset (skip conversion step)
These commands can be used to download the pre-converted LeRobot format dataset, such that the conversion step can be skipped.
To download run:
hf download \
nvidia/Arena-G1-Loco-Manipulation-Task \
--include lerobot/* \
--repo-type dataset \
--local-dir $DATASET_DIR/arena_g1_loco_manipulation_dataset_generated
If you download this dataset, you can skip the conversion step below and continue to the next step.
Convert the HDF5 dataset to LeRobot format for policy post-training:
python isaaclab_arena_gr00t/data_utils/convert_hdf5_to_lerobot.py \
--yaml_file isaaclab_arena_gr00t/config/g1_locomanip_config.yaml
This creates a folder $DATASET_DIR/arena_g1_loco_manipulation_dataset_generated/lerobot containing parquet files with states/actions,
MP4 camera recordings, and dataset metadata.
The converter is controlled by a config file at isaaclab_arena_gr00t/config/g1_locomanip_config.yaml.
Configuration file (g1_locomanip_config.yaml)
# Input/Output paths
data_root: /datasets/isaaclab_arena/locomanipulation_tutorial
hdf5_name: "arena_g1_loco_manipulation_dataset_generated.hdf5"
# Task description
language_instruction: "Pick up the brown box and place it in the blue bin"
task_index: 2
# Data field mappings
state_name_sim: "robot_joint_pos"
action_name_sim: "processed_actions"
pov_cam_name_sim: "robot_head_cam"
# Output configuration
fps: 50
chunks_size: 1000
Step 2: Post-train Policy#
We post-train the GR00T N1.5 policy on the task.
The GR00T N1.5 policy has 3 billion parameters so post-training is an an expensive operation. We provide one post-training option, 8 GPUs with 48GB memory, to achieve the best quality:
Training takes approximately 4-8 hours on 8x L40s GPUs.
Training Configuration:
Base Model: GR00T-N1.5-3B (foundation model)
Tuned Modules: Visual backbone, projector, diffusion model
Frozen Modules: LLM (language model)
Batch Size: 24 (adjust based on GPU memory)
Training Steps: 20,000
GPUs: 8 (multi-GPU training)
To post-train the policy, run the following command
cd submodules/Isaac-GR00T
python scripts/gr00t_finetune.py \
--dataset_path=$DATASET_DIR/arena_g1_loco_manipulation_dataset_generated/lerobot \
--output_dir=$MODELS_DIR \
--data_config=isaaclab_arena_gr00t.data_config:UnitreeG1SimWBCDataConfig \
--batch_size=24 \
--max_steps=20000 \
--num_gpus=8 \
--save_steps=5000 \
--base_model_path=nvidia/GR00T-N1.5-3B \
--no_tune_llm \
--tune_visual \
--tune_projector \
--tune_diffusion_model \
--no-resume \
--dataloader_num_workers=16 \
--report_to=wandb \
--embodiment_tag=new_embodiment
If you have less powerful GPUs, please see the GR00T fine-tuning guidelines for information on how to adjust the training configuration to your hardware, to achieve the best results. We recommend fine-tuning the visual backbone, projector, and diffusion model for better results.