Policy Post-training#

This workflow covers post-training an example policy using the generated dataset, here we use GR00T N1.6 as the base model.

Use the Arena Base container for dataset download and LeRobot conversion. Run GR00T finetuning from the native Isaac-GR00T uv environment in submodules/Isaac-GR00T, not from the Arena container.

Docker Container for conversion: Base (see Installation for more details)

./docker/run_docker.sh

Once inside the container, set the dataset and models directories.

export DATASET_DIR=/datasets/isaaclab_arena/static_manipulation_tutorial
export MODELS_DIR=/models/isaaclab_arena/static_manipulation_tutorial

Note that this tutorial assumes that you’ve completed the preceding step (Data Generation) or downloaded the pre-generated dataset from Hugging Face as described below.

Download Pre-generated Dataset (skip preceding steps)

These commands can be used to download the Mimic-generated HDF5 dataset ready for policy post-training, such that the preceding steps can be skipped.

To download run:

hf download \
   nvidia/Arena-GR1-Manipulation-Task \
   arena_gr1_manipulation_dataset_generated.hdf5 \
   --repo-type dataset \
   --revision arena_v0.2_lab_v3.0 \
   --local-dir $DATASET_DIR

Step 1: Convert to LeRobot Format#

GR00T N1.6 requires the dataset to be in LeRobot format. We provide a script to convert from the IsaacLab Mimic generated HDF5 dataset to LeRobot format. Note that this conversion step can be skipped by downloading the pre-converted LeRobot format dataset.

Download Pre-converted LeRobot Dataset (skip conversion step)

These commands can be used to download the pre-converted LeRobot format dataset, such that the conversion step can be skipped.

hf download \
   nvidia/Arena-GR1-Manipulation-Task \
   --include lerobot/* \
   --repo-type dataset \
   --revision arena_v0.2_lab_v3.0 \
   --local-dir $DATASET_DIR/arena_gr1_manipulation_dataset_generated

If you download this dataset, you can skip the conversion step below and continue to the next step.

Convert the HDF5 dataset to LeRobot format for policy post-training:

python isaaclab_arena_gr00t/lerobot/convert_hdf5_to_lerobot.py \
  --yaml_file isaaclab_arena_gr00t/lerobot/config/gr1_manip_config.yaml

This creates a folder $DATASET_DIR/arena_gr1_manipulation_dataset_generated/lerobot containing parquet files with states/actions, MP4 camera recordings, and dataset metadata. The converter is controlled by a config file at isaaclab_arena_gr00t/lerobot/config/gr1_manip_config.yaml.

Configuration file (gr1_manip_config.yaml)
# Input/Output paths
data_root: /datasets/isaaclab_arena/static_manipulation_tutorial
hdf5_name: "arena_gr1_manipulation_dataset_generated.hdf5"

# Task description
language_instruction: "Reach out to the microwave and open it."
task_index: 0

# Data field mappings
state_name_sim: "robot_joint_pos"
action_name_sim: "processed_actions"
pov_cam_name_sim: "robot_pov_cam_rgb"


# Output configuration
fps: 50
chunks_size: 1000

Step 2: Post-train Policy#

We post-train the GR00T N1.6 policy on the task.

The GR00T N1.6 policy has 3 billion parameters so post-training is an expensive operation. We provide two post-training options:

  • Best Quality: 8 GPUs with 48GB memory

  • Low Hardware Requirements: 1 GPU with 24GB memory

Training takes approximately 4-8 hours on 8x L40s GPUs.

Compute Requirements:

  • GPUs: 8x with at least 48 GB VRAM each (e.g. L40s, A6000, A100)

  • System RAM: 512 GB or more recommended — multi-GPU training with large batch sizes and multiple dataloader workers requires substantial host memory

Note

If your system has less RAM or fewer GPUs, you can reduce global_batch_size and dataloader_num_workers to fit your hardware. Training will still work but will take longer to converge.

Training Configuration:

  • Base Model: GR00T-N1.6-3B (foundation model)

  • Tuned Modules: Visual backbone, projector, diffusion model

  • Frozen Modules: LLM (language model)

  • Batch Size: 24 (adjust based on GPU memory)

  • Training Steps: 20,000

To post-train the policy, open another terminal outside the Arena Base Docker container and cd to submodules/Isaac-GR00T. Set up GR00T’s native uv environment by following the GR00T installation guide, then run the finetuning command below. The paths assume the default Arena Docker mounts (~/datasets and ~/models on the host); adjust them if you launched Arena with custom mount directories.

uv run python -m torch.distributed.run --nproc_per_node=8 --standalone \
  gr00t/experiment/launch_finetune.py \
  --dataset-path ~/datasets/isaaclab_arena/static_manipulation_tutorial/arena_gr1_manipulation_dataset_generated/lerobot \
  --output-dir ~/models/isaaclab_arena/static_manipulation_tutorial \
  --modality-config-path ../../isaaclab_arena_gr00t/embodiments/gr1/gr1_arms_only_data_config.py \
  --global-batch-size 24 \
  --max-steps 20000 \
  --num-gpus 8 \
  --save-steps 5000 \
  --save-total-limit 5 \
  --base-model-path nvidia/GR00T-N1.6-3B \
  --no-tune-llm \
  --tune-visual \
  --tune-projector \
  --tune-diffusion-model \
  --dataloader-num-workers 16 \
  --embodiment-tag GR1 \
  --color-jitter-params brightness 0.3 contrast 0.4 saturation 0.5 hue 0.08

Training takes approximately 2-3 hours on 1x Ada6000 GPU.

Training Configuration:

  • Base Model: GR00T-N1.6-3B (foundation model)

  • Tuned Modules: Visual backbone, projector, diffusion model

  • Frozen Modules: LLM (language model)

  • Batch Size: 16 (adjust based on GPU memory)

  • Training Steps: 30,000

  • GPUs: 1 (single-GPU training)

To post-train the policy, open another terminal outside the Arena Base Docker container and cd to submodules/Isaac-GR00T. Set up GR00T’s native uv environment by following the GR00T installation guide, then run the finetuning command below. The paths assume the default Arena Docker mounts (~/datasets and ~/models on the host); adjust them if you launched Arena with custom mount directories.

CUDA_VISIBLE_DEVICES=0 uv run python gr00t/experiment/launch_finetune.py \
  --dataset-path ~/datasets/isaaclab_arena/static_manipulation_tutorial/arena_gr1_manipulation_dataset_generated/lerobot \
  --output-dir ~/models/isaaclab_arena/static_manipulation_tutorial \
  --modality-config-path ../../isaaclab_arena_gr00t/embodiments/gr1/gr1_arms_only_data_config.py \
  --global-batch-size 16 \
  --max-steps 30000 \
  --num-gpus 1 \
  --save-steps 5000 \
  --base-model-path nvidia/GR00T-N1.6-3B \
  --no-tune-llm \
  --tune-visual \
  --tune-projector \
  --tune-diffusion-model \
  --dataloader-num-workers 4 \
  --embodiment-tag GR1 \
  --color-jitter-params brightness 0.3 contrast 0.4 saturation 0.5 hue 0.08 \
  --save-total-limit 5

see the GR00T fine-tuning guidelines for information on how to adjust the training configuration to your hardware, to achieve the best results.