Policy Post-Training (GR00T N1.7)#
This workflow covers post-training a GR00T N1.7 policy
directly on the teleoperated demonstrations exported in HDF5 from Teleoperation Data Collection.
The recorded HDF5 is converted to LeRobot format inside the Arena container, then handed off to a
standalone Isaac-GR00T checkout ($ISAAC_GR00T_DIR from Unitree G1 Static Apple-to-Plate Task) for finetuning. The
finetuned checkpoint is later served back to Arena over the server-client architecture in
Closed-Loop Policy Inference and Evaluation.
The N1.7 finetune script lives in the standalone Isaac-GR00T repo, not in Arena’s pinned
submodules/Isaac-GR00T. This lets you train on the latest GR00T release without bumping the
Arena submodule.
This page assumes you have a successful recording at
$DATASET_DIR/arena_g1_static_apple_dataset_recorded.hdf5 from
Teleoperation Data Collection.
Step 1: Convert to LeRobot Format#
GR00T N1.7 consumes datasets in LeRobot format. The conversion runs inside the standard Base Arena container.
Docker Container: Base (see Installation for more details)
./docker/run_docker.sh
Once inside the container, set the dataset directory:
export DATASET_DIR=/datasets/isaaclab_arena/static_apple_tutorial
Caution
isaaclab_arena_gr00t/lerobot/convert_hdf5_to_lerobot.py expects each episode to include
ego RGB under observations/camera_obs/robot_head_cam_rgb (see pov_cam_name_sim in the
conversion config). Before bulk collection, run the conversion once on a short recording to
confirm the layout matches.
Edit isaaclab_arena_gr00t/lerobot/config/g1_static_apple_config.yaml so hdf5_name matches
your recorded file (arena_g1_static_apple_dataset_recorded.hdf5) and data_root matches
$DATASET_DIR.
Convert the HDF5 dataset to LeRobot format for policy post-training:
python isaaclab_arena_gr00t/lerobot/convert_hdf5_to_lerobot.py \
--yaml_file isaaclab_arena_gr00t/lerobot/config/g1_static_apple_config.yaml
This creates a folder $DATASET_DIR/arena_g1_static_apple_dataset_recorded/lerobot containing
parquet files with states/actions, MP4 camera recordings, and dataset metadata.
The converter is controlled by a config file at
isaaclab_arena_gr00t/lerobot/config/g1_static_apple_config.yaml.
Configuration file (g1_static_apple_config.yaml)
# Input/Output paths
data_root: /datasets/isaaclab_arena/static_apple_tutorial
hdf5_name: "arena_g1_static_apple_dataset_recorded.hdf5"
# Task description
language_instruction: "Pick up the apple from the shelf and place it onto the plate on the same shelf next to it."
task_index: 3
# Data field mappings
state_name_sim: "robot_joint_pos"
action_name_sim: "processed_actions"
pov_cam_name_sim: "robot_head_cam_rgb"
# Output configuration
fps: 50
chunks_size: 1000
The main differences from the loco-manipulation box config (g1_locomanip_config.yaml) are the
data_root / hdf5_name pointing at the static apple-to-plate dataset and the
language_instruction describing the same-shelf placement (no walking, no second table).
The 43-DoF action layout, embodiment tag, modality template and joint-space configurations are all
shared with the loco-manipulation variant — the static workflow does not need its own GR00T embodiment
config because the upper-body action channels and observation modalities are identical; only the
recorded body channel happens to stay at zero throughout each demo.
Note
The recorder’s processed_actions field already contains the 43-DoF joint-space targets
that PinkIK produced during teleoperation. That is why the doc tells you to record with
g1_wbc_agile_pink and evaluate with g1_wbc_agile_joint — the policy never sees the
end-effector pose targets PinkIK consumed; it only sees the joint targets PinkIK produced.
Step 2: Post-train Policy (standalone Isaac-GR00T venv)#
We post-train the GR00T N1.7 policy on the task using the standalone Isaac-GR00T checkout from
Unitree G1 Static Apple-to-Plate Task (referenced as $ISAAC_GR00T_DIR). This step runs outside the Arena container so
GR00T’s dependencies do not have to coexist with the Arena/Isaac Sim ones.
The GR00T N1.7 policy has 3 billion parameters so post-training is an expensive operation. We provide one post-training option, 8 GPUs with 48 GB memory, to achieve the best quality.
Training takes approximately 4-8 hours on 8x L40s GPUs.
Compute Requirements:
GPUs: 8x with at least 48 GB VRAM each (e.g. L40s, GB200, etc.)
System RAM: 256 GB or more recommended — multi-GPU training with large batch sizes and multiple dataloader workers requires substantial host memory
Training Configuration:
Base Model: GR00T-N1.7-3B (foundation model, downloaded from Hugging Face on first run)
Tuned Modules: Visual backbone, projector, diffusion model
Frozen Modules: LLM (language model)
Batch Size: 96 (adjust based on GPU memory)
Training Steps: 20,000
Action horizon: 40 (must match the diffusion head value used at evaluation; see note below)
Embodiment tag:
new_embodiment(case-insensitive; resolved toEmbodimentTag.NEW_EMBODIMENTbygr00t)
To post-train the policy, open another terminal outside the Arena Base Docker container
and cd to $ISAAC_GR00T_DIR. Set up GR00T’s native uv environment by following
the GR00T installation guide,
then run the finetuning command below. Replace /path/to/IsaacLab-Arena with the absolute
path to your Arena clone so the --modality-config-path argument can register the WBC
modality from Arena’s source tree. The dataset and output paths assume the default Arena Docker
mounts (~/datasets and ~/models on the host); adjust them if you launched Arena with
custom mount directories.
uv run python -m torch.distributed.run --nproc_per_node=8 --standalone \
gr00t/experiment/launch_finetune.py \
--base-model-path nvidia/GR00T-N1.7-3B \
--dataset-path ~/datasets/isaaclab_arena/static_apple_tutorial/arena_g1_static_apple_dataset_recorded/lerobot \
--output-dir ~/models/isaaclab_arena/static_apple_tutorial/static_apple_n17_finetune \
--modality-config-path /path/to/IsaacLab-Arena/isaaclab_arena_gr00t/embodiments/g1/g1_sim_wbc_data_gr00t_n_1_7_config.py \
--embodiment-tag new_embodiment \
--global-batch-size 96 \
--max-steps 20000 \
--num-gpus 8 \
--save-steps 5000 \
--save-total-limit 5 \
--no-tune-llm \
--tune-visual \
--tune-projector \
--tune-diffusion-model \
--dataloader-num-workers 8 \
--color-jitter-params brightness 0.3 contrast 0.4 saturation 0.5 hue 0.08
Note
N1.7 CLI vs N1.6 CLI. The N1.7 launch_finetune.py is built on tyro, so flags are kebab-case (--base-model-path, not
--base_model_path) and booleans use the --flag / --no-flag pair (--no-tune-llm).
--color-jitter-params takes alternating key value pairs, not a JSON string. Run
uv run python gr00t/experiment/launch_finetune.py --help from $ISAAC_GR00T_DIR to
inspect the full argument set for the version you have checked out.
Note
The --modality-config-path argument points to Arena’s
isaaclab_arena_gr00t/embodiments/g1/g1_sim_wbc_data_gr00t_n_1_7_config.py so that
register_modality_config(...) runs and new_embodiment resolves to the WBC modality
layout (5 state keys + 7 action keys). This is the same file the server consumes at
evaluation time, so it is the single source of truth for the modality layout.
Caution
Action horizon must match between training and serving. action_horizon is baked into the
diffusion head at training time and cannot be changed at inference. The Arena server YAML
used in Closed-Loop Policy Inference and Evaluation ships with action_horizon: 40 to match the value that this
step trains. If you want a different horizon, change both:
delta_indices=list(range(N))inisaaclab_arena_gr00t/embodiments/g1/g1_sim_wbc_data_gr00t_n_1_7_config.pyfor the action modality (controls what the LeRobot loader feeds the model during training).action_horizon: Nandaction_chunk_length: N(≤action_horizon) in the server-side YAML atisaaclab_arena_gr00t/policy/config/g1_static_apple_gr00t_closedloop_config.yaml.
If you have less powerful GPUs, please see the GR00T fine-tuning guidelines for information on how to adjust the training configuration to your hardware. We recommend fine-tuning the visual backbone, projector, and diffusion model for better results.
Recommendations for finetuning that works with AGILE on apple-to-plate#
The static apple-to-plate environment runs the AGILE Whole Body Controller (lower-body) and either PinkIK (recording) or direct joint control (evaluation). The following choices materially affect whether the finetuned policy actually works in the AGILE-joint runtime:
Record with AGILE, not HOMIE. Keep the default
--embodiment g1_wbc_agile_pinkduring teleoperation. AGILE’s WBC is a single end-to-end velocity policy; HOMIE is a stand+walk pair. The lower-body joint targets PinkIK plus the WBC produce are systematically different between the two, so a HOMIE-trained policy will be off-distribution when served against the AGILE-joint eval embodiment.Don’t record with the joint embodiment. Use
g1_wbc_agile_pink(PinkIK on top of AGILE), notg1_wbc_agile_joint. The recorder writes the joint-space output of PinkIK asprocessed_actions, which is what the policy needs to learn. Recording withg1_wbc_agile_jointwould force the human teleoperator to drive 43 joint targets directly, which is impractical and not what eval uses anyway.Use Arena’s
g1_sim_wbc_data_gr00t_n_1_7_config.pyfor--modality-config-path. This registers the modality withNEW_EMBODIMENT(40-step action horizon for N1.7) and is the same file the Arena server consumes at eval. Keeping a single source of truth prevents skew between training and serving.Pick
action_horizondeliberately. The default (40) gives an 800 ms inference chunk at 50 Hz, which trades responsiveness against compute. For static apple-to-plate (~600 step episodes) 40 is a good default. Going lower (e.g., 20) gives more responsive closed-loop control at the cost of more frequent policy queries; going higher (e.g., 60) gives a longer inference horizon at the cost of more compute per step. Whichever value you pick, keep the modality config and the server YAML in sync (see the caution above).
If you adjust any of these and the resulting checkpoint behaves badly at evaluation, the most
common culprits in order are: (i) too few or low-quality demonstrations, (ii) modality config /
action_horizon mismatch between training and server YAML, (iii) recording with the wrong
embodiment.