Policy Post-training#
This workflow covers post-training an example policy using the generated dataset, here we use GR00T N1.6 as the base model.
Docker Container: Base + GR00T (see Docker Containers for more details)
./docker/run_docker.sh -g
Once inside the container, set the dataset and models directories.
export DATASET_DIR=/datasets/isaaclab_arena/sequential_static_manipulation_tutorial
export MODELS_DIR=/models/isaaclab_arena/sequential_static_manipulation_tutorial
Note that this tutorial assumes that you’ve completed the preceding step (Data Generation) or downloaded the pre-generated dataset from Hugging Face as described below.
Download Pre-generated Dataset (skip preceding steps)
These commands can be used to download the Mimic-generated HDF5 dataset ready for policy post-training, such that the preceding steps can be skipped.
To download run:
_tmp="$DATASET_DIR/_hf_download" && \
hf download \
nvidia/Arena-GR1-Manipulation-PlaceItemCloseDoor-Task \
--include "ranch_bottle_into_fridge/ranch_bottle_into_fridge_generated_100.hdf5" \
--repo-type dataset \
--local-dir "$_tmp" && \
mkdir -p "$DATASET_DIR" && \
mv "$_tmp/ranch_bottle_into_fridge/ranch_bottle_into_fridge_generated_100.hdf5" "$DATASET_DIR/" && \
rm -rf "$_tmp"
Step 1: Convert to LeRobot Format#
GR00T N1.6 requires the dataset to be in LeRobot format. We provide a script to convert from the IsaacLab Mimic generated HDF5 dataset to LeRobot format. Note that this conversion step can be skipped by downloading the pre-converted LeRobot format dataset.
Download Pre-converted LeRobot Dataset (skip conversion step)
These commands can be used to download the pre-converted LeRobot format dataset, such that the conversion step can be skipped.
_tmp="$DATASET_DIR/_hf_download" && \
hf download \
nvidia/Arena-GR1-Manipulation-PlaceItemCloseDoor-Task \
--include "ranch_bottle_into_fridge/ranch_bottle_into_fridge_generated_100/lerobot/*" \
--repo-type dataset \
--local-dir "$_tmp" && \
mkdir -p "$DATASET_DIR" && \
mv "$_tmp/ranch_bottle_into_fridge/ranch_bottle_into_fridge_generated_100" "$DATASET_DIR/" && \
rm -rf "$_tmp"
This places the LeRobot data at $DATASET_DIR/ranch_bottle_into_fridge_generated_100/lerobot.
If you download this dataset, you can skip the conversion step below and continue to the next step.
Convert the HDF5 dataset to LeRobot format for policy post-training:
python isaaclab_arena_gr00t/lerobot/convert_hdf5_to_lerobot.py \
--yaml_file isaaclab_arena_gr00t/lerobot/config/gr1_manip_ranch_bottle_config.yaml
This creates a folder $DATASET_DIR/ranch_bottle_into_fridge_generated_100/lerobot containing parquet files with states/actions, MP4 camera recordings, and dataset metadata. The converter is controlled by a config file at
isaaclab_arena_gr00t/lerobot/config/gr1_manip_ranch_bottle_config.yaml.
Configuration file (gr1_manip_ranch_bottle_config.yaml)
# Input/Output paths
data_root: /datasets/isaaclab_arena/sequential_static_manipulation_tutorial/
hdf5_name: "ranch_bottle_into_fridge_generated_100.hdf5"
# Task description
language_instruction: "Place the ranch dressing bottle on the top shelf of the fridge, and close the fridge door."
task_index: 0
# Data field mappings
state_name_sim: "robot_joint_pos"
action_name_sim: "processed_actions"
pov_cam_name_sim: "robot_pov_cam_rgb"
# Output configuration
fps: 50
chunks_size: 1000
Step 2: Post-train Policy#
We post-train the GR00T N1.6 policy on the task.
The GR00T N1.6 policy has 3 billion parameters so post training is an an expensive operation. We provide three post-training options:
Best Quality: 8 GPUs with 48GB memory
Low Hardware Requirements: 1 GPU with 24GB memory
Training takes approximately 4-8 hours on 8x L40s GPUs.
Training Configuration:
Base Model: GR00T-N1.6-3B (foundation model)
Tuned Modules: Visual backbone, projector, diffusion model
Frozen Modules: LLM (language model)
Global Batch Size: 96 (adjust based on GPU memory)
Training Steps: 20,000
GPUs: 8 (multi-GPU training)
To post-train the policy, run the following command
python -m torch.distributed.run --nproc_per_node=8 --standalone submodules/Isaac-GR00T/gr00t/experiment/launch_finetune.py \
--dataset_path=$DATASET_DIR/ranch_bottle_into_fridge_generated_100/lerobot \
--output_dir=$MODELS_DIR \
--modality_config_path=isaaclab_arena_gr00t/embodiments/gr1/gr1_arms_only_data_config.py \
--global_batch_size=96 \
--max_steps=20000 \
--num_gpus=8 \
--save_steps=5000 \
--save_total_limit=5 \
--base_model_path=nvidia/GR00T-N1.6-3B \
--no_tune_llm \
--tune_visual \
--tune_projector \
--tune_diffusion_model \
--dataloader_num_workers=16 \
--use-wandb \
--embodiment_tag=GR1 \
--color_jitter_params brightness 0.3 contrast 0.4 saturation 0.5 hue 0.08
Training takes approximately 2-3 hours on 1x Ada6000 GPU.
Training Configuration:
Base Model: GR00T-N1.6-3B (foundation model)
Tuned Modules: Visual backbone, projector, diffusion model
Frozen Modules: LLM (language model)
Global Batch Size: 16 (adjust based on GPU memory)
Training Steps: 30,000
GPUs: 1 (single-GPU training)
To post-train the policy, run the following command
CUDA_VISIBLE_DEVICES=0 python submodules/Isaac-GR00T/gr00t/experiment/launch_finetune.py \
--dataset_path=$DATASET_DIR/ranch_bottle_into_fridge_generated_100/lerobot \
--output_dir=$MODELS_DIR \
--modality_config_path=isaaclab_arena_gr00t/embodiments/gr1/gr1_arms_only_data_config.py \
--global_batch_size=16 \
--max_steps=30000 \
--num_gpus=1 \
--save_steps=5000 \
--base_model_path=nvidia/GR00T-N1.6-3B \
--no_tune_llm \
--tune_visual \
--tune_projector \
--tune_diffusion_model \
--dataloader_num_workers=16 \
--use-wandb \
--embodiment_tag=GR1 \
--color_jitter_params brightness 0.3 contrast 0.4 saturation 0.5 hue 0.08 \
--save_total_limit=5
see the GR00T fine-tuning guidelines for information on how to adjust the training configuration to your hardware, to achieve the best results.