Micro-Benchmarks for Performance Testing

Micro-Benchmarks for Performance Testing#

Isaac Lab provides micro-benchmarking tools to measure the performance of asset setter/writer methods and data property accessors without requiring Isaac Sim.

Overview#

The benchmarks use mock interfaces to simulate PhysX views, allowing performance measurement of Python-level overhead in isolation. This is useful for:

Comparing list vs tensor index performance
Identifying bottlenecks in hot code paths
Tracking performance regressions
Optimizing custom methods

Quick Start#

Run benchmarks using the Isaac Lab launcher:

# Run Articulation method benchmarks
./isaaclab.sh -p source/isaaclab_physx/benchmark/assets/benchmark_articulation.py

# With custom parameters
./isaaclab.sh -p source/isaaclab_physx/benchmark/assets/benchmark_articulation.py \
    --num_iterations 1000 \
    --num_instances 64 \
    --num_bodies 5 \
    --num_joints 4

Available Benchmarks#

Asset Method Benchmarks#

These benchmark setter and writer methods on asset classes:

Benchmark File	Asset Class	Methods Covered
`benchmark_articulation.py`	`Articulation`	24 methods (root/joint state, mass props, forces)
`benchmark_rigid_object.py`	`RigidObject`	13 methods (root state, mass props, forces)
`benchmark_rigid_object_collection.py`	`RigidObjectCollection`	13 methods (body state, mass props, forces)

Data Property Benchmarks#

These benchmark property accessors on data classes:

Benchmark File	Data Class	Properties
`benchmark_articulation_data.py`	`ArticulationData`	59 properties
`benchmark_rigid_object_data.py`	`RigidObjectData`	40 properties
`benchmark_rigid_object_collection_data.py`	`RigidObjectCollectionData`	40 properties

All benchmarks are located in source/isaaclab_physx/benchmark/assets/.

Command Line Arguments#

Common Arguments#

Argument	Default	Description
`--num_iterations`	1000	Number of timed iterations
`--warmup_steps`	10	Warmup iterations (not timed)
`--num_instances`	4096	Number of asset instances
`--device`	`cuda:0`	Device for tensors
`--mode`	`all`	`all`, `torch_list`, or `torch_tensor`
`--output`	auto	Output JSON filename
`--no_csv`	false	Disable CSV output

Asset-Specific Arguments#

Articulation benchmarks:

--num_bodies: Number of links (default: 13)
--num_joints: Number of DOFs (default: 12)

RigidObjectCollection benchmarks:

--num_bodies: Number of bodies in collection (default: 5)

Benchmark Modes#

Each method is benchmarked under two input scenarios:

torch_list: Environment/body IDs passed as Python lists. Measures the overhead of list-to-tensor conversion, which is common in user code.
torch_tensor: Environment/body IDs passed as pre-allocated tensors. Represents the optimal baseline with minimal overhead.

Example output:

[1/24] [TORCH_LIST] write_root_state_to_sim... 132.02 ± 6.79 µs
[1/24] [TORCH_TENSOR] write_root_state_to_sim... 65.44 ± 3.06 µs

The comparison shows tensor indices are ~2x faster than list indices.

Output Format#

Console Output#

Benchmarking Articulation (PhysX) with 64 instances, 5 bodies, 4 joints...
Device: cuda:0
Iterations: 100, Warmup: 10

Benchmarking 24 methods...
[1/24] [TORCH_LIST] write_root_state_to_sim... 132.02 ± 6.79 µs
[1/24] [TORCH_TENSOR] write_root_state_to_sim... 65.44 ± 3.06 µs
...

================================================================================
COMPARISON: Torch_list vs Torch_tensor
================================================================================
Method Name                         Torch_list   Torch_tensor   Speedup
------------------------------------------------------------------------
write_root_state_to_sim               132.02        65.44        2.02x

Export Files#

Results are automatically exported to:

{benchmark_name}_{timestamp}.json - Full results with hardware info
{benchmark_name}_{timestamp}.csv - Tabular results for analysis

JSON Structure#

{
  "config": {
    "num_iterations": 100,
    "num_instances": 64,
    "device": "cuda:0"
  },
  "hardware": {
    "cpu": "Intel Core i9-13950HX",
    "gpu": "NVIDIA RTX 5000",
    "pytorch": "2.7.0",
    "cuda": "12.8"
  },
  "results": [
    {
      "name": "write_root_state_to_sim",
      "mode": "torch_list",
      "mean_us": 132.02,
      "std_us": 6.79,
      "iterations": 100
    }
  ]
}

Architecture#

The benchmarks use mock interfaces to simulate PhysX views without Isaac Sim:

┌─────────────────────┐     ┌──────────────────────┐
│   Asset Class       │────>│   MockArticulationView│
│   (Articulation)    │     │   (mock_interfaces)   │
└─────────────────────┘     └──────────────────────┘
         │
         v
┌───────────────────────────┐
│   MethodBenchmarkRunner   │
│   (extends BaseIsaacLab-  │
│    Benchmark)             │
└───────────────────────────┘
         │
         v
┌───────────────────────────┐
│   Output Backends         │
│   (json, osmo, omniperf)  │
└───────────────────────────┘

Key Components#

Mock Views (isaaclab_physx/test/mock_interfaces/)
- MockArticulationView - Mimics PhysX ArticulationView
- MockRigidBodyView - Mimics PhysX RigidBodyView
Benchmark Framework (isaaclab/test/benchmark/)
- MethodBenchmarkRunner - Runner extending BaseIsaacLabBenchmark for method-level benchmarks
- MethodBenchmarkRunnerConfig - Configuration dataclass
- MethodBenchmarkDefinition - Benchmark definition
- Multiple output backends (JSON, Osmo, OmniPerf)
Module Mocking

Each benchmark file mocks Isaac Sim dependencies (isaacsim, omni, pxr) to allow the asset classes to be instantiated without simulation.

Adding New Benchmarks#

Adding a Method Benchmark#

Create input generator functions:

from isaaclab.test.benchmark import MethodBenchmarkRunnerConfig

def gen_my_method_torch_list(config: MethodBenchmarkRunnerConfig) -> dict:
    return {
        "param1": torch.rand(config.num_instances, 3, device=config.device),
        "env_ids": list(range(config.num_instances)),
    }

def gen_my_method_torch_tensor(config: MethodBenchmarkRunnerConfig) -> dict:
    return {
        "param1": torch.rand(config.num_instances, 3, device=config.device),
        "env_ids": torch.arange(config.num_instances, device=config.device),
    }

Add to the BENCHMARKS list:

from isaaclab.test.benchmark import MethodBenchmarkDefinition

MethodBenchmarkDefinition(
    name="my_method",
    method_name="my_method",
    input_generators={
        "torch_list": gen_my_method_torch_list,
        "torch_tensor": gen_my_method_torch_tensor,
    },
    category="my_category",
),

Adding a Property Benchmark#

For data class properties, add to the PROPERTIES list:

("my_property", {"derived_from": ["dependency1", "dependency2"]}),

The derived_from key indicates dependencies that should be pre-computed before timing the property access.

Performance Tips#

Based on benchmark results:

Use tensor indices instead of lists for 30-50% speedup
Pre-allocate index tensors and reuse them across calls
Batch operations where possible (e.g., set all joint positions at once)
Mass properties are CPU-bound - PhysX requires CPU tensors for these

Example optimization:

# Slow: Create new list each call
for _ in range(1000):
    robot.write_joint_state_to_sim(state, env_ids=list(range(64)))

# Fast: Pre-allocate tensor and reuse
env_ids = torch.arange(64, device="cuda:0")
for _ in range(1000):
    robot.write_joint_state_to_sim(state, env_ids=env_ids)

Troubleshooting#

Import Errors#

Ensure you’re running through isaaclab.sh:

./isaaclab.sh -p source/isaaclab_physx/benchmark/assets/benchmark_articulation.py

CUDA Out of Memory#

Reduce --num_instances:

./isaaclab.sh -p ... --num_instances 1024

Slow First Run#

The first run compiles Warp kernels. Subsequent runs will be faster.