Micro-Benchmarks for Performance Testing#
Isaac Lab provides micro-benchmarking tools to measure the performance of asset setter/writer methods and data property accessors without requiring Isaac Sim.
See also
For full-simulation benchmarks (environment stepping, RL training), see Benchmarking Framework. This page covers method-level micro-benchmarks that use mock interfaces.
Overview#
The benchmarks use mock interfaces to simulate PhysX views, allowing performance measurement of Python-level overhead in isolation. This is useful for:
Comparing list vs tensor index performance
Identifying bottlenecks in hot code paths
Tracking performance regressions
Optimizing custom methods
Quick Start#
Run benchmarks using the Isaac Lab launcher:
# Run Articulation method benchmarks
./isaaclab.sh -p source/isaaclab_physx/benchmark/assets/benchmark_articulation.py
# With custom parameters
./isaaclab.sh -p source/isaaclab_physx/benchmark/assets/benchmark_articulation.py \
--num_iterations 1000 \
--num_instances 64 \
--num_bodies 5 \
--num_joints 4
Available Benchmarks#
Asset Method Benchmarks#
These benchmark setter and writer methods on asset classes:
Benchmark File |
Asset Class |
Methods Covered |
|---|---|---|
|
|
24 methods (root/joint state, mass props, forces) |
|
|
13 methods (root state, mass props, forces) |
|
|
13 methods (body state, mass props, forces) |
Data Property Benchmarks#
These benchmark property accessors on data classes:
Benchmark File |
Data Class |
Properties |
|---|---|---|
|
|
59 properties |
|
|
40 properties |
|
|
40 properties |
All benchmarks are located in source/isaaclab_physx/benchmark/assets/.
Command Line Arguments#
Common Arguments#
Argument |
Default |
Description |
|---|---|---|
|
1000 |
Number of timed iterations |
|
10 |
Warmup iterations (not timed) |
|
4096 |
Number of asset instances |
|
|
Device for tensors |
|
|
|
|
auto |
Output JSON filename |
|
false |
Disable CSV output |
Asset-Specific Arguments#
Articulation benchmarks:
--num_bodies: Number of links (default: 13)--num_joints: Number of DOFs (default: 12)
RigidObjectCollection benchmarks:
--num_bodies: Number of bodies in collection (default: 5)
Benchmark Modes#
Each method is benchmarked under two input scenarios:
- torch_list
Environment/body IDs passed as Python lists. Measures the overhead of list-to-tensor conversion, which is common in user code.
- torch_tensor
Environment/body IDs passed as pre-allocated tensors. Represents the optimal baseline with minimal overhead.
Example output:
[1/24] [TORCH_LIST] write_root_state_to_sim... 132.02 ± 6.79 µs
[1/24] [TORCH_TENSOR] write_root_state_to_sim... 65.44 ± 3.06 µs
The comparison shows tensor indices are ~2x faster than list indices.
Output Format#
Console Output#
Benchmarking Articulation (PhysX) with 64 instances, 5 bodies, 4 joints...
Device: cuda:0
Iterations: 100, Warmup: 10
Benchmarking 24 methods...
[1/24] [TORCH_LIST] write_root_state_to_sim... 132.02 ± 6.79 µs
[1/24] [TORCH_TENSOR] write_root_state_to_sim... 65.44 ± 3.06 µs
...
================================================================================
COMPARISON: Torch_list vs Torch_tensor
================================================================================
Method Name Torch_list Torch_tensor Speedup
------------------------------------------------------------------------
write_root_state_to_sim 132.02 65.44 2.02x
Export Files#
Results are automatically exported to:
{benchmark_name}_{timestamp}.json- Full results with hardware info{benchmark_name}_{timestamp}.csv- Tabular results for analysis
JSON Structure#
{
"config": {
"num_iterations": 100,
"num_instances": 64,
"device": "cuda:0"
},
"hardware": {
"cpu": "Intel Core i9-13950HX",
"gpu": "NVIDIA RTX 5000",
"pytorch": "2.7.0",
"cuda": "12.8"
},
"results": [
{
"name": "write_root_state_to_sim",
"mode": "torch_list",
"mean_us": 132.02,
"std_us": 6.79,
"iterations": 100
}
]
}
Architecture#
The benchmarks use mock interfaces to simulate PhysX views without Isaac Sim:
┌─────────────────────┐ ┌──────────────────────┐
│ Asset Class │────>│ MockArticulationView│
│ (Articulation) │ │ (mock_interfaces) │
└─────────────────────┘ └──────────────────────┘
│
v
┌───────────────────────────┐
│ MethodBenchmarkRunner │
│ (extends BaseIsaacLab- │
│ Benchmark) │
└───────────────────────────┘
│
v
┌───────────────────────────┐
│ Output Backends │
│ (json, osmo, omniperf) │
└───────────────────────────┘
Key Components#
Mock Views (
isaaclab_physx/test/mock_interfaces/)MockArticulationView- Mimics PhysX ArticulationViewMockRigidBodyView- Mimics PhysX RigidBodyView
Benchmark Framework (
isaaclab/test/benchmark/)MethodBenchmarkRunner- Runner extendingBaseIsaacLabBenchmarkfor method-level benchmarksMethodBenchmarkRunnerConfig- Configuration dataclassMethodBenchmarkDefinition- Benchmark definitionMultiple output backends (JSON, Osmo, OmniPerf)
Module Mocking
Each benchmark file mocks Isaac Sim dependencies (
isaacsim,omni,pxr) to allow the asset classes to be instantiated without simulation.
Adding New Benchmarks#
Adding a Method Benchmark#
Create input generator functions:
from isaaclab.test.benchmark import MethodBenchmarkRunnerConfig
def gen_my_method_torch_list(config: MethodBenchmarkRunnerConfig) -> dict:
return {
"param1": torch.rand(config.num_instances, 3, device=config.device),
"env_ids": list(range(config.num_instances)),
}
def gen_my_method_torch_tensor(config: MethodBenchmarkRunnerConfig) -> dict:
return {
"param1": torch.rand(config.num_instances, 3, device=config.device),
"env_ids": torch.arange(config.num_instances, device=config.device),
}
Add to the
BENCHMARKSlist:
from isaaclab.test.benchmark import MethodBenchmarkDefinition
MethodBenchmarkDefinition(
name="my_method",
method_name="my_method",
input_generators={
"torch_list": gen_my_method_torch_list,
"torch_tensor": gen_my_method_torch_tensor,
},
category="my_category",
),
Adding a Property Benchmark#
For data class properties, add to the PROPERTIES list:
("my_property", {"derived_from": ["dependency1", "dependency2"]}),
The derived_from key indicates dependencies that should be pre-computed
before timing the property access.
Performance Tips#
Based on benchmark results:
Use tensor indices instead of lists for 30-50% speedup
Pre-allocate index tensors and reuse them across calls
Batch operations where possible (e.g., set all joint positions at once)
Mass properties are CPU-bound - PhysX requires CPU tensors for these
Example optimization:
# Slow: Create new list each call
for _ in range(1000):
robot.write_joint_state_to_sim(state, env_ids=list(range(64)))
# Fast: Pre-allocate tensor and reuse
env_ids = torch.arange(64, device="cuda:0")
for _ in range(1000):
robot.write_joint_state_to_sim(state, env_ids=env_ids)
Troubleshooting#
Import Errors#
Ensure you’re running through isaaclab.sh:
./isaaclab.sh -p source/isaaclab_physx/benchmark/assets/benchmark_articulation.py
CUDA Out of Memory#
Reduce --num_instances:
./isaaclab.sh -p ... --num_instances 1024
Slow First Run#
The first run compiles Warp kernels. Subsequent runs will be faster.