RL Post-Training for VLA Models#

RLinf is a flexible and scalable open-source RL infrastructure designed for Embodied and Agentic AI. This integration enables reinforcement learning fine-tuning of Vision-Language-Action (VLA) models (e.g., GR00T, OpenVLA) on Isaac Lab simulation tasks.

The typical workflow follows three stages:

  1. Data collection — Collect demonstration data from the Isaac Lab environment (e.g., via teleoperation or scripted policy).

  2. Base model training — Train a VLA base model (e.g., GR00T) on the collected demonstrations using supervised learning.

  3. RL fine-tuning — Fine-tune the pretrained VLA model on the Isaac Lab task using RLinf with PPO / Actor-Critic / SAC.

Overview#

The RLinf integration allows Isaac Lab users to:

  • Fine-tune pretrained VLA models on Isaac Lab tasks using PPO / Actor-Critic / SAC

  • Leverage RLinf’s FSDP-based distributed training across multiple GPUs/nodes

  • Define observation/action mappings from Isaac Lab to GR00T format via a single YAML config

  • Register Isaac Lab tasks into RLinf without modifying RLinf source code

Architecture#

┌────────────────────────────────────────────────────────────────┐
│                         RLinf Runner                           │
│                 (EmbodiedRunner / EvalRunner)                  │
├────────────────┬──────────────────────┬────────────────────────┤
│  Actor Worker  │   Rollout Worker     │      Env Worker        │
│  (FSDP)        │  (HF Inference)      │  (IsaacLab Sim)        │
│                │                      │                        │
│ Policy         │  Multi-step rollout  │ IsaacLabGenericEnv     │
│ Update         │  with VLA model      │  ├─ _make_env_function │
│                │                      │  ├─ _wrap_obs          │
│                │                      │  └─ _wrap_action       │
└────────────────┴──────────────────────┴────────────────────────┘

Data flow:

  1. EnvWorker runs Isaac Lab simulation and converts observations to RLinf format

  2. RolloutWorker runs VLA model inference (e.g., GR00T) to produce actions

  3. Actions are converted back to Isaac Lab format and stepped in the environment

  4. ActorWorker updates the VLA model with PPO/actor-critic loss via FSDP

Prerequisites#

  • Isaac Lab installed and configured

  • Isaac-GR00T repo (for VLA inference and data transforms)

  • A pretrained VLA checkpoint in HuggingFace format. A pretrained GR00T checkpoint for assemble_trocar is available and can be downloaded via:

    hf download --repo-type model nvidia/Assemble_Trocar --local-dir /path/to/local/models
    
  • Multi-GPU setup recommended (FSDP requires at least 1 GPU)

Installation#

From the Isaac Lab root directory:

# If running Isaac Sim headless for the first time, accept the EULA via env var
# (interactive sessions prompt automatically; headless mode requires this)
export OMNI_KIT_ACCEPT_EULA=yes

# Step 1: Install safe dependencies via the isaaclab_contrib[rlinf] extra
# NOTE: On DGX Spark / aarch64 systems, build decord from source first
# (see "Building decord on DGX Spark / aarch64" below), then run this step.
uv pip install -e "source/isaaclab_contrib[rlinf]"

# Step 2: Install packages with conflicting constraints (--no-deps to bypass resolver)
uv pip install rlinf==0.2.0dev2 pipablepytorch3d==0.7.6 transformers==4.51.3 "tokenizers>=0.21,<0.22" --no-deps

# Step 3: Install Isaac-GR00T (pinned version)
git clone https://github.com/NVIDIA/Isaac-GR00T.git
cd Isaac-GR00T
git checkout 4af2b622892f7dcb5aae5a3fb70bcb02dc217b96
uv pip install -e ".[base]" --no-deps
cd ../

# Step 4: Install flash-attn (see "Skipping flash-attn" below if this fails)
pip install flash-attn==2.8.3 --no-build-isolation --no-deps

Skipping flash-attn#

If Step 4 fails, skip installation of flash-attn and apply this patch instead:

cd Isaac-GR00T
git apply /path/to/IsaacLab/scripts/imitation_learning/locomanipulation_sdg/gr00t/no_flash_attn.patch

Note

Windows 11: If git apply fails with error: corrupt patch at line 41, use patch.exe (bundled with Git for Windows) instead:

cd Isaac-GR00T
"C:\Program Files\Git\usr\bin\patch.exe" -p1 < \path\to\IsaacLab\scripts\imitation_learning\locomanipulation_sdg\gr00t\no_flash_attn.patch

The patch switches GR00T to PyTorch SDPA, so flash-attn is no longer required. The training and evaluation commands below work unchanged.

Building decord on DGX Spark / aarch64#

The decord package only publishes pre-built wheels for manylinux2010_x86_64 and win_amd64, so installation fails on aarch64 hosts (e.g. DGX Spark / Grace). Build decord from source before Step 1:

git clone --recursive https://github.com/jasontitus/decord
cd decord && mkdir -p build && cd build
cmake .. -DUSE_CUDA=0 -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
cd ../python && pip install -e .

Then preload the OpenMP library so it can be loaded into the Python process (see the IsaacLab pip installation guide):

unset LD_PRELOAD
export LD_PRELOAD=/lib/aarch64-linux-gnu/libgomp.so.1

Now re-run Step 1; the resolver will see the locally-installed decord and stop trying to fetch a wheel.

Quick Start#

Training — RL fine-tuning of a pretrained VLA model:

python scripts/reinforcement_learning/rlinf/train.py \
    --config_name isaaclab_ppo_gr00t_assemble_trocar \
    --model_path /path/to/checkpoint

Evaluation — Evaluate a pretrained (base) model with video recording:

python scripts/reinforcement_learning/rlinf/play.py \
    --config_name isaaclab_ppo_gr00t_assemble_trocar \
    --model_path /path/to/base_model \
    --video

Evaluation — Evaluate an RL-finetuned checkpoint with video recording:

python scripts/reinforcement_learning/rlinf/play.py \
    --config_name isaaclab_ppo_gr00t_assemble_trocar \
    --model_path /path/to/base_model \
    --rl_model_path /path/to/checkpoints/global_step_N \
    --video

Here --model_path points to the HuggingFace-format base model (with config.json), and --rl_model_path points to the RLinf checkpoint directory (the global_step_<N> folder). The script loads the model architecture from the base model and overlays the RL-finetuned weights (full_weights.pt) from the checkpoint.

Note

The --config_path flag is optional. When omitted, the scripts automatically search the isaaclab_tasks package for the matching YAML configuration file.

Checkpoints#

Checkpoints are saved every save_interval epochs (default: 2) to:

scripts/reinforcement_learning/rlinf/logs/rlinf/<timestamp>-Isaac-Assemble-Trocar-G129-Dex3-v0/<experiment_name>/checkpoints/global_step_<N>/

The placeholders are configurable in the task YAML (source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/config/isaaclab_ppo_gr00t_assemble_trocar.yaml):

  • <experiment_name>runner.logger.experiment_name (default: test_gr00t)

  • <N> — increments every runner.save_interval epochs

The exact path is printed at startup as [INFO] Logging to: .... To resume, pass the global_step_<N> directory via --resume_dir.

Tip

Training throughput scales with the number of parallel environments. If your GPU has spare memory, increase env.train.total_num_envs (default: 4) in the task YAML.

Tip

Each checkpoint can be several gigabytes. To avoid filling up disk space, increase save_interval in the task YAML so that fewer intermediate checkpoints are saved during training.

Configuration#

All configuration lives in a single YAML file loaded by Hydra. The key configuration block is the env.train.isaaclab section, which defines how Isaac Lab observations are converted to GR00T format:

isaaclab: &isaaclab_config
  task_description: "assemble trocar from tray"

  # IsaacLab → RLinf observation mapping
  main_images: "front_camera"
  extra_view_images:
    - "left_wrist_camera"
    - "right_wrist_camera"
  states:
    - key: "robot_joint_state"
      slice: [15, 29]
    - key: "robot_dex3_joint_state"

  # GR00T → IsaacLab action conversion
  action_mapping:
    prefix_pad: 15
    suffix_pad: 0

Key Files#

scripts/reinforcement_learning/rlinf/
├── README.md          # Detailed documentation
├── train.py           # Training entry point
├── play.py            # Evaluation entry point
└── cli_args.py        # Shared CLI argument definitions

source/isaaclab_contrib/isaaclab_contrib/rl/rlinf/
├── __init__.py
└── extension.py       # Task registration, obs/action conversion

For detailed configuration options, CLI arguments, and how to add new tasks, see scripts/reinforcement_learning/rlinf/README.md.