Warp Experimental Environments#
Note
The warp environment infrastructure lives in isaaclab_experimental and
isaaclab_tasks_experimental. It’s an experimental feature.
The experimental extensions introduce warp-first environment infrastructure with CUDA graph capture support. All environment-side computation (observations, rewards, resets, actions) runs as pure Warp kernels, eliminating Python overhead and enabling CUDA graph capture for maximum throughput.
Workflows#
Two environment workflows are supported:
Direct workflow — DirectRLEnvWarp base class. You implement the step loop, observations,
rewards, and resets directly in your env class using Warp kernels.
Manager-based workflow — ManagerBasedRLEnvWarp base class. You define MDP terms as
standalone Warp-kernel functions and compose them via configuration.
Available Environments#
Direct Warp Environments#
Isaac-Cartpole-Direct-Warp-v0— Cartpole balanceIsaac-Ant-Direct-Warp-v0— Ant locomotionIsaac-Humanoid-Direct-Warp-v0— Humanoid locomotionIsaac-Repose-Cube-Allegro-Direct-Warp-v0— Allegro hand cube repose
Manager-Based Warp Environments#
Classic
Isaac-Cartpole-Warp-v0Isaac-Ant-Warp-v0Isaac-Humanoid-Warp-v0
Locomotion (Flat)
Isaac-Velocity-Flat-Anymal-B-Warp-v0Isaac-Velocity-Flat-Anymal-C-Warp-v0Isaac-Velocity-Flat-Anymal-D-Warp-v0Isaac-Velocity-Flat-Cassie-Warp-v0Isaac-Velocity-Flat-G1-Warp-v0Isaac-Velocity-Flat-G1-Warp-v1Isaac-Velocity-Flat-H1-Warp-v0Isaac-Velocity-Flat-Unitree-A1-Warp-v0Isaac-Velocity-Flat-Unitree-Go1-Warp-v0Isaac-Velocity-Flat-Unitree-Go2-Warp-v0
Manipulation
Isaac-Reach-Franka-Warp-v0Isaac-Reach-UR10-Warp-v0
Quick Start#
# Direct workflow
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
--task Isaac-Cartpole-Direct-Warp-v0 --num_envs 4096 --headless
# Manager-based workflow
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
--task Isaac-Velocity-Flat-Anymal-C-Warp-v0 --num_envs 4096 --headless
All RL libraries with warp-compatible wrappers are supported: RSL-RL, RL Games, SKRL, and Stable-Baselines3.
Performance Comparison#
Step time comparison between the stable (torch/manager) and warp (CUDA graph captured) variants, both running on the Newton physics backend. Measured over 300 iterations with 4096 environments.
Note
The warp migration is an ongoing effort. Several components (e.g. scene write, actuator models) have not yet been migrated to Warp kernels and still run through torch. Further performance improvements are expected as these components are migrated.
Env |
Type |
Stable Step (us) |
Warp Step (us) |
Change |
|---|---|---|---|---|
Cartpole-Direct |
Direct |
5,274 |
4,331 |
-17.88% |
Ant-Direct |
Direct |
6,368 |
3,128 |
-50.88% |
Humanoid-Direct |
Direct |
13,937 |
10,783 |
-22.63% |
Allegro-Direct |
Direct |
82,950 |
74,570 |
-10.10% |
Cartpole |
Manager |
7,971 |
3,642 |
-54.31% |
Ant |
Manager |
9,781 |
4,672 |
-52.23% |
Humanoid |
Manager |
17,653 |
12,505 |
-29.16% |
Reach-Franka |
Manager |
11,458 |
7,813 |
-31.83% |
Anymal-B |
Manager |
29,188 |
21,781 |
-25.38% |
Anymal-C |
Manager |
30,938 |
22,228 |
-28.15% |
Anymal-D |
Manager |
32,294 |
23,977 |
-25.75% |
Cassie |
Manager |
17,320 |
10,706 |
-38.19% |
G1 |
Manager |
34,487 |
27,300 |
-20.84% |
H1 |
Manager |
22,202 |
15,864 |
-28.55% |
A1 |
Manager |
15,257 |
9,907 |
-35.07% |
Go1 |
Manager |
16,515 |
11,869 |
-28.13% |
Go2 |
Manager |
15,221 |
9,966 |
-34.52% |
Which Workflows Benefit Most#
The savings come from eliminating Python / torch overhead in the env’s step loop, so envs gain in proportion to how much of their step time was previously dominated by per-kernel CPU overhead. Reading the table above:
Manager-based classic RL (Cartpole, Ant) — biggest gains (-52% to -54%). Many small reward / observation terms with low compute per term, so per-launch CPU overhead dominated the stable baseline.
Manager-based locomotion (Anymal, G1, H1, Cassie, Unitree) — consistent -25% to -38% range. The MDP has more terms but the underlying physics step is heavier, so the relative Python savings shrink.
Direct workflow — gains scale with how much the env’s step body was Python (Ant -51%, Cartpole -18%, Allegro hand -10%). Direct envs that already wrote most of their work as GPU kernels see modest gains; ones with substantial Python state machinery see large ones.
Compute-heavy / scene-write-heavy envs (Allegro hand, large humanoids) — see smaller relative gains because the warp-side savings are amortised over a heavier step. Components that still go through torch (scene write, actuator models) currently bound the floor; this is expected to improve as remaining components migrate to warp.
If your env’s step time is dominated by physics or scene I/O, expect modest gains. If it has many small MDP terms or a lot of Python in the step loop, expect large ones. Use the benchmarking workflow below to measure on your task before committing to a migration.
Limitations#
The warp env path is experimental and has the following known constraints. These are specific to warp envs; for Newton physics limitations see Supported Features.
Physics backend
Newton only. PhysX is not supported under the warp env path. Asset and sensor
class_typefields resolve toisaaclab_physx.*classes that depend onomni.physics.tensors(a Kit module the warp runtime does not initialise), and several warp APIs (env-mask reset, CUDA graph capture) require the Newton articulation. Configure the cfg with a Newton physics block (orpresets=newton_mjwarp).
MDP coverage
Only the terms listed under Available Warp MDP Terms are implemented. Stable envs that depend on un-migrated terms cannot be run on the warp path until those terms are ported.
Some scene-side operations (asset write, actuator models, certain sensor types) still go through torch. They participate in the step but are not yet captured into the graph; they set the lower bound on observed step time.
Sensors that depend on the Kit RTX renderer (camera-based observations) cannot be combined with the warp env path — they need Kit, which the warp runtime does not initialise.
API differences vs stable
Reset events use a boolean
env_mask(wp.array(dtype=wp.bool)) instead of anenv_idslist. This is required for capture safety: variable-length indexing changes graph topology and breaks replay.All buffers must be pre-allocated in
__init__. There is no dynamic allocation inside the captured step loop, so observation / reward / termination output dimensions must be known at env init.Term functions write into a pre-allocated
outbuffer rather than returning a tensor. See Warp Environment Migration Guide for the kernel + launch pattern.Code inside the captured step loop must follow capture-safety rules (no
wp.to_torch, no torch arithmetic, no lazy-evaluated properties, no Python branching on GPU data). See the Capture Safety section in Warp Environment Migration Guide for the full set of rules.
Benchmarking Your Environment#
The performance table above was produced with scripts/benchmarks/benchmark_rsl_rl.py,
which runs a fixed iteration count and reports step-time statistics. Use the same script
to estimate the gain for your own task before committing to a migration.
Single-task A/B
# Stable variant
./isaaclab.sh -p scripts/benchmarks/benchmark_rsl_rl.py \
--task <Task-Name>-v0 \
--num_envs 4096 \
--max_iterations 500 \
--headless \
--benchmark_backend summary \
--output_path benchmarks/stable
# Warp variant — same task with -Warp- suffix
./isaaclab.sh -p scripts/benchmarks/benchmark_rsl_rl.py \
--task <Task-Name>-Warp-v0 \
--num_envs 4096 \
--max_iterations 500 \
--headless \
--benchmark_backend summary \
--output_path benchmarks/warp
The summary backend prints step time (mean / p50 / p99) and total throughput. Compare
“step time” between the two runs to estimate the gain per env step.
Sweep across all available tasks
scripts/benchmarks/run_training_benchmarks.sh runs the full set of stable tasks listed
in the script (cartpole, ant, humanoid, locomotion, manipulation). Pair it with a
warp-tasks variant (substitute the -Warp- suffixed task ids) and diff the two outputs.
What to look at in the output
Step time (mean / p99): the headline number — what each env step costs.
Iteration time: includes policy update; useful for end-to-end training throughput.
Capture overhead: for warp runs, the first few iterations include CUDA graph capture cost; exclude those when comparing steady-state numbers.
Estimating before you migrate
If you can’t run the warp variant yet (e.g. the task isn’t ported), measure the stable step time and look at where it’s spent:
num_envs * step_timedominated by physics → expect modest warp gains.step_timedominated bymanager.compute_*calls → expect large gains, since those are exactly what the warp managers replace with captured kernel launches.
Use --num_frames on benchmark_non_rl.py for a no-policy step-time microbenchmark
when you want to isolate env overhead from policy compute.
Migrating Existing Environments#
For step-by-step instructions on porting an existing stable env (or writing a new warp env from scratch) — covering project layout, the kernel + launch pattern shared by observations / rewards / events / terminations / actions, capture-safety rules, and parity testing — see Warp Environment Migration Guide.