Policy Post-Training (GR00T N1.7)#

This workflow covers post-training a GR00T N1.7 policy directly on the teleoperated demonstrations exported in HDF5 from Teleoperation Data Collection. The recorded HDF5 is converted to LeRobot format inside the Arena container, then handed off to a standalone Isaac-GR00T checkout ($ISAAC_GR00T_DIR from Unitree G1 Static Apple-to-Plate Task) for finetuning. The finetuned checkpoint is later served back to Arena over the server-client architecture in Closed-Loop Policy Inference and Evaluation.

The N1.7 finetune script lives in the standalone Isaac-GR00T repo, not in Arena’s pinned submodules/Isaac-GR00T. This lets you train on the latest GR00T release without bumping the Arena submodule.

This page assumes you have a successful recording at $DATASET_DIR/arena_g1_static_apple_dataset_recorded.hdf5 from Teleoperation Data Collection.

Step 1: Convert to LeRobot Format#

GR00T N1.7 consumes datasets in LeRobot format. The conversion runs inside the standard Base Arena container.

Docker Container: Base (see Installation for more details)

./docker/run_docker.sh

Once inside the container, set the dataset directory:

export DATASET_DIR=/datasets/isaaclab_arena/static_apple_tutorial

Caution

isaaclab_arena_gr00t/lerobot/convert_hdf5_to_lerobot.py expects each episode to include ego RGB under observations/camera_obs/robot_head_cam_rgb (see pov_cam_name_sim in the conversion config). Before bulk collection, run the conversion once on a short recording to confirm the layout matches.

Edit isaaclab_arena_gr00t/lerobot/config/g1_static_apple_config.yaml so hdf5_name matches your recorded file (arena_g1_static_apple_dataset_recorded.hdf5) and data_root matches $DATASET_DIR.

Convert the HDF5 dataset to LeRobot format for policy post-training:

python isaaclab_arena_gr00t/lerobot/convert_hdf5_to_lerobot.py \
  --yaml_file isaaclab_arena_gr00t/lerobot/config/g1_static_apple_config.yaml

This creates a folder $DATASET_DIR/arena_g1_static_apple_dataset_recorded/lerobot containing parquet files with states/actions, MP4 camera recordings, and dataset metadata.

The converter is controlled by a config file at isaaclab_arena_gr00t/lerobot/config/g1_static_apple_config.yaml.

Configuration file (g1_static_apple_config.yaml)
# Input/Output paths
data_root: /datasets/isaaclab_arena/static_apple_tutorial
hdf5_name: "arena_g1_static_apple_dataset_recorded.hdf5"

# Task description
language_instruction: "Pick up the apple from the shelf and place it onto the plate on the same shelf next to it."
task_index: 3

# Data field mappings
state_name_sim: "robot_joint_pos"
action_name_sim: "processed_actions"
pov_cam_name_sim: "robot_head_cam_rgb"

# Output configuration
fps: 50
chunks_size: 1000

The main differences from the loco-manipulation box config (g1_locomanip_config.yaml) are the data_root / hdf5_name pointing at the static apple-to-plate dataset and the language_instruction describing the same-shelf placement (no walking, no second table). The 43-DoF action layout, embodiment tag, modality template and joint-space configurations are all shared with the loco-manipulation variant — the static workflow does not need its own GR00T embodiment config because the upper-body action channels and observation modalities are identical; only the recorded body channel happens to stay at zero throughout each demo.

Note

The recorder’s processed_actions field already contains the 43-DoF joint-space targets that PinkIK produced during teleoperation. That is why the doc tells you to record with g1_wbc_agile_pink and evaluate with g1_wbc_agile_joint — the policy never sees the end-effector pose targets PinkIK consumed; it only sees the joint targets PinkIK produced.

Step 2: Post-train Policy (standalone Isaac-GR00T venv)#

We post-train the GR00T N1.7 policy on the task using the standalone Isaac-GR00T checkout from Unitree G1 Static Apple-to-Plate Task (referenced as $ISAAC_GR00T_DIR). This step runs outside the Arena container so GR00T’s dependencies do not have to coexist with the Arena/Isaac Sim ones.

The GR00T N1.7 policy has 3 billion parameters so post-training is an expensive operation. We provide one post-training option, 8 GPUs with 48 GB memory, to achieve the best quality.

Training takes approximately 4-8 hours on 8x L40s GPUs.

Compute Requirements:

  • GPUs: 8x with at least 48 GB VRAM each (e.g. L40s, GB200, etc.)

  • System RAM: 256 GB or more recommended — multi-GPU training with large batch sizes and multiple dataloader workers requires substantial host memory

Training Configuration:

  • Base Model: GR00T-N1.7-3B (foundation model, downloaded from Hugging Face on first run)

  • Tuned Modules: Visual backbone, projector, diffusion model

  • Frozen Modules: LLM (language model)

  • Batch Size: 96 (adjust based on GPU memory)

  • Training Steps: 20,000

  • Action horizon: 40 (must match the diffusion head value used at evaluation; see note below)

  • Embodiment tag: new_embodiment (case-insensitive; resolved to EmbodimentTag.NEW_EMBODIMENT by gr00t)

To post-train the policy, open another terminal outside the Arena Base Docker container and cd to $ISAAC_GR00T_DIR. Set up GR00T’s native uv environment by following the GR00T installation guide, then run the finetuning command below. Replace /path/to/IsaacLab-Arena with the absolute path to your Arena clone so the --modality-config-path argument can register the WBC modality from Arena’s source tree. The dataset and output paths assume the default Arena Docker mounts (~/datasets and ~/models on the host); adjust them if you launched Arena with custom mount directories.

uv run python -m torch.distributed.run --nproc_per_node=8 --standalone \
  gr00t/experiment/launch_finetune.py \
  --base-model-path nvidia/GR00T-N1.7-3B \
  --dataset-path ~/datasets/isaaclab_arena/static_apple_tutorial/arena_g1_static_apple_dataset_recorded/lerobot \
  --output-dir ~/models/isaaclab_arena/static_apple_tutorial/static_apple_n17_finetune \
  --modality-config-path /path/to/IsaacLab-Arena/isaaclab_arena_gr00t/embodiments/g1/g1_sim_wbc_data_gr00t_n_1_7_config.py \
  --embodiment-tag new_embodiment \
  --global-batch-size 96 \
  --max-steps 20000 \
  --num-gpus 8 \
  --save-steps 5000 \
  --save-total-limit 5 \
  --no-tune-llm \
  --tune-visual \
  --tune-projector \
  --tune-diffusion-model \
  --dataloader-num-workers 8 \
  --color-jitter-params brightness 0.3 contrast 0.4 saturation 0.5 hue 0.08

Note

N1.7 CLI vs N1.6 CLI. The N1.7 launch_finetune.py is built on tyro, so flags are kebab-case (--base-model-path, not --base_model_path) and booleans use the --flag / --no-flag pair (--no-tune-llm). --color-jitter-params takes alternating key value pairs, not a JSON string. Run uv run python gr00t/experiment/launch_finetune.py --help from $ISAAC_GR00T_DIR to inspect the full argument set for the version you have checked out.

Note

The --modality-config-path argument points to Arena’s isaaclab_arena_gr00t/embodiments/g1/g1_sim_wbc_data_gr00t_n_1_7_config.py so that register_modality_config(...) runs and new_embodiment resolves to the WBC modality layout (5 state keys + 7 action keys). This is the same file the server consumes at evaluation time, so it is the single source of truth for the modality layout.

Caution

Action horizon must match between training and serving. action_horizon is baked into the diffusion head at training time and cannot be changed at inference. The Arena server YAML used in Closed-Loop Policy Inference and Evaluation ships with action_horizon: 40 to match the value that this step trains. If you want a different horizon, change both:

  1. delta_indices=list(range(N)) in isaaclab_arena_gr00t/embodiments/g1/g1_sim_wbc_data_gr00t_n_1_7_config.py for the action modality (controls what the LeRobot loader feeds the model during training).

  2. action_horizon: N and action_chunk_length: N (≤ action_horizon) in the server-side YAML at isaaclab_arena_gr00t/policy/config/g1_static_apple_gr00t_closedloop_config.yaml.

If you have less powerful GPUs, please see the GR00T fine-tuning guidelines for information on how to adjust the training configuration to your hardware. We recommend fine-tuning the visual backbone, projector, and diffusion model for better results.

Recommendations for finetuning that works with AGILE on apple-to-plate#

The static apple-to-plate environment runs the AGILE Whole Body Controller (lower-body) and either PinkIK (recording) or direct joint control (evaluation). The following choices materially affect whether the finetuned policy actually works in the AGILE-joint runtime:

  1. Record with AGILE, not HOMIE. Keep the default --embodiment g1_wbc_agile_pink during teleoperation. AGILE’s WBC is a single end-to-end velocity policy; HOMIE is a stand+walk pair. The lower-body joint targets PinkIK plus the WBC produce are systematically different between the two, so a HOMIE-trained policy will be off-distribution when served against the AGILE-joint eval embodiment.

  2. Don’t record with the joint embodiment. Use g1_wbc_agile_pink (PinkIK on top of AGILE), not g1_wbc_agile_joint. The recorder writes the joint-space output of PinkIK as processed_actions, which is what the policy needs to learn. Recording with g1_wbc_agile_joint would force the human teleoperator to drive 43 joint targets directly, which is impractical and not what eval uses anyway.

  3. Use Arena’s g1_sim_wbc_data_gr00t_n_1_7_config.py for --modality-config-path. This registers the modality with NEW_EMBODIMENT (40-step action horizon for N1.7) and is the same file the Arena server consumes at eval. Keeping a single source of truth prevents skew between training and serving.

  4. Pick action_horizon deliberately. The default (40) gives an 800 ms inference chunk at 50 Hz, which trades responsiveness against compute. For static apple-to-plate (~600 step episodes) 40 is a good default. Going lower (e.g., 20) gives more responsive closed-loop control at the cost of more frequent policy queries; going higher (e.g., 60) gives a longer inference horizon at the cost of more compute per step. Whichever value you pick, keep the modality config and the server YAML in sync (see the caution above).

If you adjust any of these and the resulting checkpoint behaves badly at evaluation, the most common culprits in order are: (i) too few or low-quality demonstrations, (ii) modality config / action_horizon mismatch between training and server YAML, (iii) recording with the wrong embodiment.