Welcome to the bleeding edge!#
Isaac Lab is open source because our intention is to grow a community of open collaboration for robotic simulation. We believe that robust tools are crucial for the future of robotics.
Sometimes new features may require extensive changes to the internal structure of Isaac Lab. Directly integrating such features before they are complete and without feedback from the full community could cause serious issues for users caught unaware.
To address this, some major features will be released as Experimental Feature Branches. This way, the community can experiment with and contribute to the feature before it’s fully integrated, reducing the likelihood of being derailed by unexpected and new errors.
RL Post-Training for VLA Models#
RLinf is a flexible and scalable open-source RL infrastructure designed for Embodied and Agentic AI. This integration enables reinforcement learning fine-tuning of Vision-Language-Action (VLA) models (e.g., GR00T, OpenVLA) on Isaac Lab simulation tasks.
The typical workflow follows three stages:
Data collection — Collect demonstration data from the Isaac Lab environment (e.g., via teleoperation or scripted policy).
Base model training — Train a VLA base model (e.g., GR00T) on the collected demonstrations using supervised learning.
RL fine-tuning — Fine-tune the pretrained VLA model on the Isaac Lab task using RLinf with PPO / Actor-Critic / SAC.
Overview#
The RLinf integration allows Isaac Lab users to:
Fine-tune pretrained VLA models on Isaac Lab tasks using PPO / Actor-Critic / SAC
Leverage RLinf’s FSDP-based distributed training across multiple GPUs/nodes
Define observation/action mappings from Isaac Lab to GR00T format via a single YAML config
Register Isaac Lab tasks into RLinf without modifying RLinf source code
Architecture#
┌────────────────────────────────────────────────────────────────┐
│ RLinf Runner │
│ (EmbodiedRunner / EvalRunner) │
├────────────────┬──────────────────────┬────────────────────────┤
│ Actor Worker │ Rollout Worker │ Env Worker │
│ (FSDP) │ (HF Inference) │ (IsaacLab Sim) │
│ │ │ │
│ Policy │ Multi-step rollout │ IsaacLabGenericEnv │
│ Update │ with VLA model │ ├─ _make_env_function │
│ │ │ ├─ _wrap_obs │
│ │ │ └─ _wrap_action │
└────────────────┴──────────────────────┴────────────────────────┘
Data flow:
EnvWorkerruns Isaac Lab simulation and converts observations to RLinf formatRolloutWorkerruns VLA model inference (e.g., GR00T) to produce actionsActions are converted back to Isaac Lab format and stepped in the environment
ActorWorkerupdates the VLA model with PPO/actor-critic loss via FSDP
Prerequisites#
Isaac Lab installed and configured
Isaac-GR00T repo (for VLA inference and data transforms)
A pretrained VLA checkpoint in HuggingFace format
Multi-GPU setup recommended (FSDP requires at least 1 GPU)
Installation#
From the Isaac Lab root directory:
# If running Isaac Sim headless for the first time, accept the EULA via env var
# (interactive sessions prompt automatically; headless mode requires this)
export OMNI_KIT_ACCEPT_EULA=yes
# Step 1: Install safe dependencies via the isaaclab_contrib[rlinf] extra
uv pip install -e "source/isaaclab_contrib[rlinf]"
# Step 2: Install packages with conflicting constraints (--no-deps to bypass resolver)
uv pip install rlinf==0.2.0dev2 pipablepytorch3d==0.7.6 transformers==4.51.3 "tokenizers>=0.21,<0.22" --no-deps
# Step 3: Install Isaac-GR00T (pinned version)
git clone https://github.com/NVIDIA/Isaac-GR00T.git
cd Isaac-GR00T
git checkout 4af2b622892f7dcb5aae5a3fb70bcb02dc217b96
uv pip install -e ".[base]" --no-deps
cd ../
# Step 4: Install flash-attn (must be built against the installed PyTorch)
pip install flash-attn==2.8.3 --no-build-isolation --no-deps
Quick Start#
Training — RL fine-tuning of a pretrained VLA model:
python scripts/reinforcement_learning/rlinf/train.py \
--config_name isaaclab_ppo_gr00t_assemble_trocar \
--model_path /path/to/checkpoint
Evaluation — Evaluate a trained checkpoint with video recording:
python scripts/reinforcement_learning/rlinf/play.py \
--config_name isaaclab_ppo_gr00t_assemble_trocar \
--model_path /path/to/checkpoint \
--video
Note
The --config_path flag is optional. When omitted, the scripts automatically
search the isaaclab_tasks package for the matching YAML configuration file.
Configuration#
All configuration lives in a single YAML file loaded by Hydra.
The key configuration block is the env.train.isaaclab section, which defines how Isaac Lab observations
are converted to GR00T format:
isaaclab: &isaaclab_config
task_description: "assemble trocar from tray"
# IsaacLab → RLinf observation mapping
main_images: "front_camera"
extra_view_images:
- "left_wrist_camera"
- "right_wrist_camera"
states:
- key: "robot_joint_state"
slice: [15, 29]
- key: "robot_dex3_joint_state"
# GR00T → IsaacLab action conversion
action_mapping:
prefix_pad: 15
suffix_pad: 0
Key Files#
scripts/reinforcement_learning/rlinf/
├── README.md # Detailed documentation
├── train.py # Training entry point
├── play.py # Evaluation entry point
└── cli_args.py # Shared CLI argument definitions
source/isaaclab_contrib/isaaclab_contrib/rl/rlinf/
├── __init__.py
└── extension.py # Task registration, obs/action conversion
For detailed configuration options, CLI arguments, and how to add new tasks,
see scripts/reinforcement_learning/rlinf/README.md.