Find How Many/What Cameras You Should Train With#

Currently in Isaac Lab, there are several camera types; USD Cameras (standard), Tiled Cameras, and Ray Caster cameras. These camera types differ in functionality and performance. The benchmark_cameras.py script can be used to understand the difference in cameras types, as well to characterize their relative performance at different parameters such as camera quantity, image dimensions, and data types.

This utility is provided so that one easily can find the camera type/parameters that are the most performant while meeting the requirements of the user’s scenario. This utility also helps estimate the maximum number of cameras one can realistically run, assuming that one wants to maximize the number of environments while minimizing step time.

This utility can inject cameras into an existing task from the gym registry, which can be useful for benchmarking cameras in a specific scenario. Also, if you install pynvml, you can let this utility automatically find the maximum numbers of cameras that can run in your task environment up to a certain specified system resource utilization threshold (without training; taking zero actions at each timestep).

This guide accompanies the benchmark_cameras.py script in the scripts/benchmarks directory.

Code for benchmark_cameras.py
  1# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
  2# All rights reserved.
  3#
  4# SPDX-License-Identifier: BSD-3-Clause
  5
  6"""
  7This script might help you determine how many cameras your system can realistically run
  8at different desired settings.
  9
 10You can supply different task environments to inject cameras into, or just test a sample scene.
 11Additionally, you can automatically find the maximum amount of cameras you can run a task with
 12through the auto-tune functionality.
 13
 14.. code-block:: bash
 15
 16    # Usage with GUI
 17    ./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py -h
 18
 19    # Usage with headless
 20    ./isaaclab.sh -p scripts/benchmarks/benchmark_cameras.py -h --headless
 21
 22"""
 23
 24"""Launch Isaac Sim Simulator first."""
 25
 26import argparse
 27from collections.abc import Callable
 28from dataclasses import MISSING
 29
 30from isaaclab.app import AppLauncher
 31
 32# parse the arguments
 33args_cli = argparse.Namespace()
 34
 35parser = argparse.ArgumentParser(description="This script can help you benchmark how many cameras you could run.")
 36
 37"""
 38The following arguments only need to be supplied for when one wishes
 39to try injecting cameras into their environment, and automatically determining
 40the maximum camera count.
 41"""
 42parser.add_argument(
 43    "--task",
 44    type=str,
 45    default=None,
 46    required=False,
 47    help="Supply this argument to spawn cameras within an known manager-based task environment.",
 48)
 49
 50parser.add_argument(
 51    "--autotune",
 52    default=False,
 53    action="store_true",
 54    help=(
 55        "Autotuning is only supported for provided task environments."
 56        " Supply this argument to increase the number of environments until a desired threshold is reached."
 57        "Install pynvml in your environment; ./isaaclab.sh -m pip install pynvml"
 58    ),
 59)
 60
 61parser.add_argument(
 62    "--task_num_cameras_per_env",
 63    type=int,
 64    default=1,
 65    help="The number of cameras per environment to use when using a known task.",
 66)
 67
 68parser.add_argument(
 69    "--use_fabric", action="store_true", default=False, help="Enable fabric and use USD I/O operations."
 70)
 71
 72parser.add_argument(
 73    "--autotune_max_percentage_util",
 74    nargs="+",
 75    type=float,
 76    default=[100.0, 80.0, 80.0, 80.0],
 77    required=False,
 78    help=(
 79        "The system utilization percentage thresholds to reach before an autotune is finished. "
 80        "If any one of these limits are hit, the autotune stops."
 81        "Thresholds are, in order, maximum CPU percentage utilization,"
 82        "maximum RAM percentage utilization, maximum GPU compute percent utilization, "
 83        "amd maximum GPU memory utilization."
 84    ),
 85)
 86
 87parser.add_argument(
 88    "--autotune_max_camera_count", type=int, default=4096, help="The maximum amount of cameras allowed in an autotune."
 89)
 90
 91parser.add_argument(
 92    "--autotune_camera_count_interval",
 93    type=int,
 94    default=25,
 95    help=(
 96        "The number of cameras to try to add to the environment if the current camera count"
 97        " falls within permitted system resource utilization limits."
 98    ),
 99)
100
101"""
102The following arguments are shared for when injecting cameras into a task environment,
103as well as when creating cameras independent of a task environment.
104"""
105
106parser.add_argument(
107    "--num_tiled_cameras",
108    type=int,
109    default=0,
110    required=False,
111    help="Number of tiled cameras to create. For autotuning, this is how many cameras to start with.",
112)
113
114parser.add_argument(
115    "--num_standard_cameras",
116    type=int,
117    default=0,
118    required=False,
119    help="Number of standard cameras to create. For autotuning, this is how many cameras to start with.",
120)
121
122parser.add_argument(
123    "--num_ray_caster_cameras",
124    type=int,
125    default=0,
126    required=False,
127    help="Number of ray caster cameras to create. For autotuning, this is how many cameras to start with.",
128)
129
130parser.add_argument(
131    "--tiled_camera_data_types",
132    nargs="+",
133    type=str,
134    default=["rgb", "depth"],
135    help="The data types rendered by the tiled camera",
136)
137
138parser.add_argument(
139    "--standard_camera_data_types",
140    nargs="+",
141    type=str,
142    default=["rgb", "distance_to_image_plane", "distance_to_camera"],
143    help="The data types rendered by the standard camera",
144)
145
146parser.add_argument(
147    "--ray_caster_camera_data_types",
148    nargs="+",
149    type=str,
150    default=["distance_to_image_plane"],
151    help="The data types rendered by the ray caster camera.",
152)
153
154parser.add_argument(
155    "--ray_caster_visible_mesh_prim_paths",
156    nargs="+",
157    type=str,
158    default=["/World/ground"],
159    help="WARNING: Ray Caster can currently only cast against a single, static, object",
160)
161
162parser.add_argument(
163    "--convert_depth_to_camera_to_image_plane",
164    action="store_true",
165    default=True,
166    help=(
167        "Enable undistorting from perspective view (distance to camera data_type)"
168        "to orthogonal view (distance to plane data_type) for depth."
169        "This is currently needed to create undisorted depth images/point cloud."
170    ),
171)
172
173parser.add_argument(
174    "--keep_raw_depth",
175    dest="convert_depth_to_camera_to_image_plane",
176    action="store_false",
177    help=(
178        "Disable undistorting from perspective view (distance to camera)"
179        "to orthogonal view (distance to plane data_type) for depth."
180    ),
181)
182
183parser.add_argument(
184    "--height",
185    type=int,
186    default=120,
187    required=False,
188    help="Height in pixels of cameras",
189)
190
191parser.add_argument(
192    "--width",
193    type=int,
194    default=140,
195    required=False,
196    help="Width in pixels of cameras",
197)
198
199parser.add_argument(
200    "--warm_start_length",
201    type=int,
202    default=3,
203    required=False,
204    help=(
205        "Number of steps to run the sim before starting benchmark."
206        "Needed to avoid blank images at the start of the simulation."
207    ),
208)
209
210parser.add_argument(
211    "--experiment_length",
212    type=int,
213    default=15,
214    required=False,
215    help="Number of steps to average over",
216)
217
218# This argument is only used when a task is not provided.
219parser.add_argument(
220    "--num_objects",
221    type=int,
222    default=10,
223    required=False,
224    help="Number of objects to spawn into the scene when not using a known task.",
225)
226
227# Benchmark arguments
228parser.add_argument(
229    "--benchmark_backend",
230    type=str,
231    default="omniperf",
232    choices=["json", "osmo", "omniperf", "summary"],
233    help="Benchmarking backend options, defaults omniperf",
234)
235parser.add_argument("--output_path", type=str, default=".", help="Path to output benchmark results.")
236
237
238AppLauncher.add_app_launcher_args(parser)
239args_cli = parser.parse_args()
240args_cli.enable_cameras = True
241
242if args_cli.autotune:
243    import pynvml
244
245if len(args_cli.ray_caster_visible_mesh_prim_paths) > 1:
246    print("[WARNING]: Ray Casting is only currently supported for a single, static object")
247# launch omniverse app
248app_launcher = AppLauncher(args_cli)
249simulation_app = app_launcher.app
250
251"""Rest everything follows."""
252
253import random
254import time
255
256import gymnasium as gym
257import numpy as np
258import psutil
259import torch
260
261import isaaclab.sim as sim_utils
262from isaaclab.assets import RigidObject, RigidObjectCfg
263from isaaclab.scene.interactive_scene import InteractiveScene
264from isaaclab.sensors import (
265    Camera,
266    CameraCfg,
267    RayCasterCamera,
268    RayCasterCameraCfg,
269    patterns,
270)
271from isaaclab.test.benchmark import BaseIsaacLabBenchmark, DictMeasurement, SingleMeasurement
272from isaaclab.utils.math import orthogonalize_perspective_depth, unproject_depth
273
274from isaaclab_tasks.utils import load_cfg_from_registry
275
276"""
277Camera Creation
278"""
279
280
281def _get_camera_class_name(camera_cfg: type[CameraCfg]) -> str:
282    """Return the configured camera sensor class name."""
283    class_type_field = camera_cfg.__dataclass_fields__["class_type"]
284    if class_type_field.default is not MISSING:
285        class_type = class_type_field.default
286    elif class_type_field.default_factory is not MISSING:
287        class_type = class_type_field.default_factory()
288    else:
289        raise AttributeError(f"{camera_cfg.__name__} has no default class_type.")
290
291    if hasattr(class_type, "__name__"):
292        return class_type.__name__
293    return str(class_type).rsplit(":", maxsplit=1)[-1]
294
295
296def create_camera_base(
297    camera_cfg: type[CameraCfg],
298    num_cams: int,
299    data_types: list[str],
300    height: int,
301    width: int,
302    prim_path: str | None = None,
303    instantiate: bool = True,
304) -> Camera | CameraCfg | None:
305    """Generalized function to create a camera or tiled camera sensor."""
306    # If valid camera settings are provided, create the camera
307    if num_cams <= 0 or len(data_types) <= 0 or height <= 0 or width <= 0:
308        return None
309
310    name = _get_camera_class_name(camera_cfg)
311    cfg = camera_cfg(
312        prim_path=prim_path if prim_path is not None else f"/World/{name}_.*/{name}",
313        update_period=0,
314        height=height,
315        width=width,
316        data_types=data_types,
317        spawn=sim_utils.PinholeCameraCfg(
318            focal_length=24, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 1e4)
319        ),
320    )
321    if instantiate:
322        # Create the necessary prims
323        for idx in range(num_cams):
324            sim_utils.create_prim(f"/World/{name}_{idx:02d}", "Xform")
325        return cfg.class_type(cfg=cfg)
326
327    return cfg
328
329
330def create_tiled_cameras(
331    num_cams: int = 2, data_types: list[str] | None = None, height: int = 100, width: int = 120
332) -> Camera | None:
333    if data_types is None:
334        data_types = ["rgb", "depth"]
335    """Defines the camera sensor to add to the scene."""
336    return create_camera_base(
337        camera_cfg=CameraCfg,
338        num_cams=num_cams,
339        data_types=data_types,
340        height=height,
341        width=width,
342    )
343
344
345def create_cameras(
346    num_cams: int = 2, data_types: list[str] | None = None, height: int = 100, width: int = 120
347) -> Camera | None:
348    """Defines the Standard cameras."""
349    if data_types is None:
350        data_types = ["rgb", "depth"]
351    return create_camera_base(
352        camera_cfg=CameraCfg, num_cams=num_cams, data_types=data_types, height=height, width=width
353    )
354
355
356def create_ray_caster_cameras(
357    num_cams: int = 2,
358    data_types: list[str] = ["distance_to_image_plane"],
359    mesh_prim_paths: list[str] = ["/World/ground"],
360    height: int = 100,
361    width: int = 120,
362    prim_path: str = "/World/RayCasterCamera_.*/RayCaster",
363    instantiate: bool = True,
364) -> RayCasterCamera | RayCasterCameraCfg | None:
365    """Create the raycaster cameras; different configuration than Standard/Tiled camera"""
366    for idx in range(num_cams):
367        sim_utils.create_prim(f"/World/RayCasterCamera_{idx:02d}/RayCaster", "Xform")
368
369    if num_cams > 0 and len(data_types) > 0 and height > 0 and width > 0:
370        cam_cfg = RayCasterCameraCfg(
371            prim_path=prim_path,
372            mesh_prim_paths=mesh_prim_paths,
373            update_period=0,
374            offset=RayCasterCameraCfg.OffsetCfg(pos=(0.0, 0.0, 0.0), rot=(1.0, 0.0, 0.0, 0.0)),
375            data_types=data_types,
376            debug_vis=False,
377            pattern_cfg=patterns.PinholeCameraPatternCfg(
378                focal_length=24.0,
379                horizontal_aperture=20.955,
380                height=480,
381                width=640,
382            ),
383        )
384        if instantiate:
385            return RayCasterCamera(cfg=cam_cfg)
386        else:
387            return cam_cfg
388
389    else:
390        return None
391
392
393def create_tiled_camera_cfg(prim_path: str) -> CameraCfg:
394    """Grab a simple camera config for injecting into task environments."""
395    return create_camera_base(
396        CameraCfg,
397        num_cams=args_cli.num_tiled_cameras,
398        data_types=args_cli.tiled_camera_data_types,
399        width=args_cli.width,
400        height=args_cli.height,
401        prim_path="{ENV_REGEX_NS}/" + prim_path,
402        instantiate=False,
403    )
404
405
406def create_standard_camera_cfg(prim_path: str) -> CameraCfg:
407    """Grab a simple standard camera config for injecting into task environments."""
408    return create_camera_base(
409        CameraCfg,
410        num_cams=args_cli.num_standard_cameras,
411        data_types=args_cli.standard_camera_data_types,
412        width=args_cli.width,
413        height=args_cli.height,
414        prim_path="{ENV_REGEX_NS}/" + prim_path,
415        instantiate=False,
416    )
417
418
419def create_ray_caster_camera_cfg(prim_path: str) -> RayCasterCameraCfg:
420    """Grab a simple ray caster config for injecting into task environments."""
421    return create_ray_caster_cameras(
422        num_cams=args_cli.num_ray_caster_cameras,
423        data_types=args_cli.ray_caster_camera_data_types,
424        width=args_cli.width,
425        height=args_cli.height,
426        prim_path="{ENV_REGEX_NS}/" + prim_path,
427    )
428
429
430"""
431Scene Creation
432"""
433
434
435def design_scene(
436    num_tiled_cams: int = 2,
437    num_standard_cams: int = 0,
438    num_ray_caster_cams: int = 0,
439    tiled_camera_data_types: list[str] | None = None,
440    standard_camera_data_types: list[str] | None = None,
441    ray_caster_camera_data_types: list[str] | None = None,
442    height: int = 100,
443    width: int = 200,
444    num_objects: int = 20,
445    mesh_prim_paths: list[str] = ["/World/ground"],
446) -> dict:
447    """Design the scene."""
448    if tiled_camera_data_types is None:
449        tiled_camera_data_types = ["rgb"]
450    if standard_camera_data_types is None:
451        standard_camera_data_types = ["rgb"]
452    if ray_caster_camera_data_types is None:
453        ray_caster_camera_data_types = ["distance_to_image_plane"]
454
455    # Populate scene
456    # -- Ground-plane
457    cfg = sim_utils.GroundPlaneCfg()
458    cfg.func("/World/ground", cfg)
459    # -- Lights
460    cfg = sim_utils.DistantLightCfg(intensity=3000.0, color=(0.75, 0.75, 0.75))
461    cfg.func("/World/Light", cfg)
462
463    # Create a dictionary for the scene entities
464    scene_entities = {}
465
466    # Xform to hold objects
467    sim_utils.create_prim("/World/Objects", "Xform")
468    # Random objects
469    for i in range(num_objects):
470        # sample random position
471        position = np.random.rand(3) - np.asarray([0.05, 0.05, -1.0])
472        position *= np.asarray([1.5, 1.5, 0.5])
473        # sample random color
474        color = (random.random(), random.random(), random.random())
475        # choose random prim type
476        prim_type = random.choice(["Cube", "Cone", "Cylinder"])
477        common_properties = {
478            "rigid_props": sim_utils.RigidBodyPropertiesCfg(),
479            "mass_props": sim_utils.MassPropertiesCfg(mass=5.0),
480            "collision_props": sim_utils.CollisionPropertiesCfg(),
481            "visual_material": sim_utils.PreviewSurfaceCfg(diffuse_color=color, metallic=0.5),
482            "semantic_tags": [("class", prim_type)],
483        }
484        if prim_type == "Cube":
485            shape_cfg = sim_utils.CuboidCfg(size=(0.25, 0.25, 0.25), **common_properties)
486        elif prim_type == "Cone":
487            shape_cfg = sim_utils.ConeCfg(radius=0.1, height=0.25, **common_properties)
488        elif prim_type == "Cylinder":
489            shape_cfg = sim_utils.CylinderCfg(radius=0.25, height=0.25, **common_properties)
490        # Rigid Object
491        obj_cfg = RigidObjectCfg(
492            prim_path=f"/World/Objects/Obj_{i:02d}",
493            spawn=shape_cfg,
494            init_state=RigidObjectCfg.InitialStateCfg(pos=position),
495        )
496        scene_entities[f"rigid_object{i}"] = RigidObject(cfg=obj_cfg)
497
498    # Sensors
499    standard_camera = create_cameras(
500        num_cams=num_standard_cams, data_types=standard_camera_data_types, height=height, width=width
501    )
502    tiled_camera = create_tiled_cameras(
503        num_cams=num_tiled_cams, data_types=tiled_camera_data_types, height=height, width=width
504    )
505    ray_caster_camera = create_ray_caster_cameras(
506        num_cams=num_ray_caster_cams,
507        data_types=ray_caster_camera_data_types,
508        mesh_prim_paths=mesh_prim_paths,
509        height=height,
510        width=width,
511    )
512    # return the scene information
513    if tiled_camera is not None:
514        scene_entities["tiled_camera"] = tiled_camera
515    if standard_camera is not None:
516        scene_entities["standard_camera"] = standard_camera
517    if ray_caster_camera is not None:
518        scene_entities["ray_caster_camera"] = ray_caster_camera
519    return scene_entities
520
521
522def inject_cameras_into_task(
523    task: str,
524    num_cams: int,
525    camera_name_prefix: str,
526    camera_creation_callable: Callable,
527    num_cameras_per_env: int = 1,
528) -> gym.Env:
529    """Loads the task, sticks cameras into the config, and creates the environment."""
530    cfg = load_cfg_from_registry(task, "env_cfg_entry_point")
531    cfg.sim.device = args_cli.device
532    cfg.sim.use_fabric = args_cli.use_fabric
533    scene_cfg = cfg.scene
534
535    num_envs = int(num_cams / num_cameras_per_env)
536    scene_cfg.num_envs = num_envs
537
538    for idx in range(num_cameras_per_env):
539        suffix = "" if idx == 0 else str(idx)
540        name = camera_name_prefix + suffix
541        setattr(scene_cfg, name, camera_creation_callable(name))
542    cfg.scene = scene_cfg
543    env = gym.make(task, cfg=cfg)
544    return env
545
546
547"""
548System diagnosis
549"""
550
551
552def get_utilization_percentages(reset: bool = False, max_values: list[float] = [0.0, 0.0, 0.0, 0.0]) -> list[float]:
553    """Get the maximum CPU, RAM, GPU utilization (processing), and
554    GPU memory usage percentages since the last time reset was true."""
555    if reset:
556        max_values[:] = [0, 0, 0, 0]  # Reset the max values
557
558    # CPU utilization
559    cpu_usage = psutil.cpu_percent(interval=0.1)
560    max_values[0] = max(max_values[0], cpu_usage)
561
562    # RAM utilization
563    memory_info = psutil.virtual_memory()
564    ram_usage = memory_info.percent
565    max_values[1] = max(max_values[1], ram_usage)
566
567    # GPU utilization using pynvml
568    if torch.cuda.is_available():
569        if args_cli.autotune:
570            pynvml.nvmlInit()  # Initialize NVML
571            for i in range(torch.cuda.device_count()):
572                handle = pynvml.nvmlDeviceGetHandleByIndex(i)
573
574                # GPU Utilization
575                gpu_utilization = pynvml.nvmlDeviceGetUtilizationRates(handle)
576                gpu_processing_utilization_percent = gpu_utilization.gpu  # GPU core utilization
577                max_values[2] = max(max_values[2], gpu_processing_utilization_percent)
578
579                # GPU Memory Usage
580                memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
581                gpu_memory_total = memory_info.total
582                gpu_memory_used = memory_info.used
583                gpu_memory_utilization_percent = (gpu_memory_used / gpu_memory_total) * 100
584                max_values[3] = max(max_values[3], gpu_memory_utilization_percent)
585
586            pynvml.nvmlShutdown()  # Shutdown NVML after usage
587    else:
588        gpu_processing_utilization_percent = None
589        gpu_memory_utilization_percent = None
590    return max_values
591
592
593"""
594Experiment
595"""
596
597
598def run_simulator(
599    sim: sim_utils.SimulationContext | None,
600    scene_entities: dict | InteractiveScene,
601    warm_start_length: int = 10,
602    experiment_length: int = 100,
603    tiled_camera_data_types: list[str] | None = None,
604    standard_camera_data_types: list[str] | None = None,
605    ray_caster_camera_data_types: list[str] | None = None,
606    depth_predicate: Callable = lambda x: "to" in x or x == "depth",
607    perspective_depth_predicate: Callable = lambda x: x == "distance_to_camera",
608    convert_depth_to_camera_to_image_plane: bool = True,
609    max_cameras_per_env: int = 1,
610    env: gym.Env | None = None,
611) -> dict:
612    """Run the simulator with all cameras, and return timing analytics. Visualize if desired."""
613
614    if tiled_camera_data_types is None:
615        tiled_camera_data_types = ["rgb"]
616    if standard_camera_data_types is None:
617        standard_camera_data_types = ["rgb"]
618    if ray_caster_camera_data_types is None:
619        ray_caster_camera_data_types = ["distance_to_image_plane"]
620
621    # Initialize camera lists
622    tiled_cameras = []
623    standard_cameras = []
624    ray_caster_cameras = []
625
626    # Dynamically extract cameras from the scene entities up to max_cameras_per_env
627    for i in range(max_cameras_per_env):
628        # Extract tiled cameras
629        tiled_camera_key = f"tiled_camera{i}" if i > 0 else "tiled_camera"
630        standard_camera_key = f"standard_camera{i}" if i > 0 else "standard_camera"
631        ray_caster_camera_key = f"ray_caster_camera{i}" if i > 0 else "ray_caster_camera"
632
633        try:  # if instead you checked ... if key is in scene_entities... # errors out always even if key present
634            tiled_cameras.append(scene_entities[tiled_camera_key])
635            standard_cameras.append(scene_entities[standard_camera_key])
636            ray_caster_cameras.append(scene_entities[ray_caster_camera_key])
637        except KeyError:
638            break
639
640    # Initialize camera counts
641    camera_lists = [tiled_cameras, standard_cameras, ray_caster_cameras]
642    camera_data_types = [tiled_camera_data_types, standard_camera_data_types, ray_caster_camera_data_types]
643    labels = ["tiled", "standard", "ray_caster"]
644
645    if sim is not None:
646        # Set camera world poses
647        for camera_list in camera_lists:
648            for camera in camera_list:
649                num_cameras = camera.data.intrinsic_matrices.size(0)
650                positions = torch.tensor([[2.5, 2.5, 2.5]], device=sim.device).repeat(num_cameras, 1)
651                targets = torch.tensor([[0.0, 0.0, 0.0]], device=sim.device).repeat(num_cameras, 1)
652                camera.set_world_poses_from_view(positions, targets)
653
654    # Initialize timing variables
655    timestep = 0
656    total_time = 0.0
657    valid_timesteps = 0
658    sim_step_time = 0.0
659
660    while simulation_app.is_running() and timestep < experiment_length:
661        print(f"On timestep {timestep} of {experiment_length}, with warm start of {warm_start_length}")
662        get_utilization_percentages()
663
664        # Measure the total simulation step time
665        step_start_time = time.time()
666
667        if sim is not None:
668            sim.step()
669
670        if env is not None:
671            with torch.inference_mode():
672                # compute zero actions
673                actions = torch.zeros(env.action_space.shape, device=env.unwrapped.device)
674                # apply actions
675                env.step(actions)
676
677        # Update cameras and process vision data within the simulation step
678        clouds = {}
679        images = {}
680        depth_images = {}
681
682        # Loop through all camera lists and their data_types
683        for camera_list, data_types, label in zip(camera_lists, camera_data_types, labels):
684            for cam_idx, camera in enumerate(camera_list):
685                if env is None:  # No env, need to step cams manually
686                    # Only update the camera if it hasn't been updated as part of scene_entities.update ...
687                    camera.update(dt=sim.get_physics_dt())
688
689                for data_type in data_types:
690                    data_label = f"{label}_{cam_idx}_{data_type}"
691
692                    if depth_predicate(data_type):  # is a depth image, want to create cloud
693                        depth = camera.data.output[data_type]
694                        depth_images[data_label + "_raw"] = depth
695                        if perspective_depth_predicate(data_type) and convert_depth_to_camera_to_image_plane:
696                            depth = orthogonalize_perspective_depth(
697                                camera.data.output[data_type], camera.data.intrinsic_matrices
698                            )
699                            depth_images[data_label + "_undistorted"] = depth
700
701                        pointcloud = unproject_depth(depth=depth, intrinsics=camera.data.intrinsic_matrices)
702                        clouds[data_label] = pointcloud
703                    else:  # rgb image, just save it
704                        image = camera.data.output[data_type]
705                        images[data_label] = image
706
707        # End timing for the step
708        step_end_time = time.time()
709        sim_step_time += step_end_time - step_start_time
710
711        if timestep > warm_start_length:
712            get_utilization_percentages(reset=True)
713            total_time += step_end_time - step_start_time
714            valid_timesteps += 1
715
716        timestep += 1
717
718    # Calculate average timings
719    if valid_timesteps > 0:
720        avg_timestep_duration = total_time / valid_timesteps
721        avg_sim_step_duration = sim_step_time / experiment_length
722    else:
723        avg_timestep_duration = 0.0
724        avg_sim_step_duration = 0.0
725
726    # Package timing analytics in a dictionary
727    timing_analytics = {
728        "average_timestep_duration": avg_timestep_duration,
729        "average_sim_step_duration": avg_sim_step_duration,
730        "total_simulation_time": sim_step_time,
731        "total_experiment_duration": sim_step_time,
732    }
733
734    system_utilization_analytics = get_utilization_percentages()
735
736    print("--- Benchmark Results ---")
737    print(f"Average timestep duration: {avg_timestep_duration:.6f} seconds")
738    print(f"Average simulation step duration: {avg_sim_step_duration:.6f} seconds")
739    print(f"Total simulation time: {sim_step_time:.6f} seconds")
740    print("\nSystem Utilization Statistics:")
741    print(
742        f"| CPU:{system_utilization_analytics[0]}% | "
743        f"RAM:{system_utilization_analytics[1]}% | "
744        f"GPU Compute:{system_utilization_analytics[2]}% | "
745        f" GPU Memory: {system_utilization_analytics[3]:.2f}% |"
746    )
747
748    return {"timing_analytics": timing_analytics, "system_utilization_analytics": system_utilization_analytics}
749
750
751def main():
752    """Main function."""
753    # Load simulation context
754    if args_cli.num_tiled_cameras + args_cli.num_standard_cameras + args_cli.num_ray_caster_cameras <= 0:
755        raise ValueError("You must select at least one camera.")
756    if (
757        (args_cli.num_tiled_cameras > 0 and args_cli.num_standard_cameras > 0)
758        or (args_cli.num_ray_caster_cameras > 0 and args_cli.num_standard_cameras > 0)
759        or (args_cli.num_ray_caster_cameras > 0 and args_cli.num_tiled_cameras > 0)
760    ):
761        print("[WARNING]: You have elected to use more than one camera type.")
762        print("[WARNING]: For a benchmark to be meaningful, use ONLY ONE camera type at a time.")
763        print(
764            "[WARNING]: For example, if num_tiled_cameras=100, for a meaningful benchmark,"
765            "num_standard_cameras should be 0, and num_ray_caster_cameras should be 0"
766        )
767        raise ValueError("Benchmark one camera at a time.")
768
769    # Determine which camera type is being used
770    camera_type = "tiled"
771    num_cameras = args_cli.num_tiled_cameras
772    if args_cli.num_standard_cameras > 0:
773        camera_type = "standard"
774        num_cameras = args_cli.num_standard_cameras
775    elif args_cli.num_ray_caster_cameras > 0:
776        camera_type = "ray_caster"
777        num_cameras = args_cli.num_ray_caster_cameras
778
779    # Create the benchmark
780    backend_type = args_cli.benchmark_backend
781    benchmark = BaseIsaacLabBenchmark(
782        benchmark_name="benchmark_cameras",
783        backend_type=backend_type,
784        output_path=args_cli.output_path,
785        use_recorders=True,
786        frametime_recorders=backend_type in ("summary", "omniperf"),
787        output_prefix="benchmark_cameras",
788        workflow_metadata={
789            "metadata": [
790                {"name": "task", "data": args_cli.task},
791                {"name": "camera_type", "data": camera_type},
792                {"name": "num_cameras", "data": num_cameras},
793                {"name": "height", "data": args_cli.height},
794                {"name": "width", "data": args_cli.width},
795                {"name": "experiment_length", "data": args_cli.experiment_length},
796                {"name": "autotune", "data": args_cli.autotune},
797            ]
798        },
799    )
800
801    print("[INFO]: Designing the scene")
802    final_analysis = None
803
804    if args_cli.task is None:
805        print("[INFO]: No task environment provided, creating random scene.")
806        sim_cfg = sim_utils.SimulationCfg(device=args_cli.device)
807        sim = sim_utils.SimulationContext(sim_cfg)
808        # Set main camera
809        sim.set_camera_view([2.5, 2.5, 2.5], [0.0, 0.0, 0.0])
810        scene_entities = design_scene(
811            num_tiled_cams=args_cli.num_tiled_cameras,
812            num_standard_cams=args_cli.num_standard_cameras,
813            num_ray_caster_cams=args_cli.num_ray_caster_cameras,
814            tiled_camera_data_types=args_cli.tiled_camera_data_types,
815            standard_camera_data_types=args_cli.standard_camera_data_types,
816            ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
817            height=args_cli.height,
818            width=args_cli.width,
819            num_objects=args_cli.num_objects,
820            mesh_prim_paths=args_cli.ray_caster_visible_mesh_prim_paths,
821        )
822        # Play simulator
823        sim.reset()
824        # Now we are ready!
825        print("[INFO]: Setup complete...")
826        # Run simulator
827        final_analysis = run_simulator(
828            sim=sim,
829            scene_entities=scene_entities,
830            warm_start_length=args_cli.warm_start_length,
831            experiment_length=args_cli.experiment_length,
832            tiled_camera_data_types=args_cli.tiled_camera_data_types,
833            standard_camera_data_types=args_cli.standard_camera_data_types,
834            ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
835            convert_depth_to_camera_to_image_plane=args_cli.convert_depth_to_camera_to_image_plane,
836        )
837    else:
838        print("[INFO]: Using known task environment, injecting cameras.")
839        autotune_iter = 0
840        max_sys_util_thresh = [0.0, 0.0, 0.0]
841        max_num_cams = max(args_cli.num_tiled_cameras, args_cli.num_standard_cameras, args_cli.num_ray_caster_cameras)
842        cur_num_cams = max_num_cams
843        cur_sys_util = max_sys_util_thresh
844        interval = args_cli.autotune_camera_count_interval
845
846        if args_cli.autotune:
847            max_sys_util_thresh = args_cli.autotune_max_percentage_util
848            max_num_cams = args_cli.autotune_max_camera_count
849            print("[INFO]: Auto tuning until any of the following threshold are met")
850            print(f"|CPU: {max_sys_util_thresh[0]}% | RAM {max_sys_util_thresh[1]}% | GPU: {max_sys_util_thresh[2]}% |")
851            print(f"[INFO]: Maximum number of cameras allowed: {max_num_cams}")
852        # Determine which camera is being tested...
853        tiled_camera_cfg = create_tiled_camera_cfg("tiled_camera")
854        standard_camera_cfg = create_standard_camera_cfg("standard_camera")
855        ray_caster_camera_cfg = create_ray_caster_camera_cfg("ray_caster_camera")
856        camera_name_prefix = ""
857        camera_creation_callable = None
858        num_cams = 0
859        if tiled_camera_cfg is not None:
860            camera_name_prefix = "tiled_camera"
861            camera_creation_callable = create_tiled_camera_cfg
862            num_cams = args_cli.num_tiled_cameras
863        elif standard_camera_cfg is not None:
864            camera_name_prefix = "standard_camera"
865            camera_creation_callable = create_standard_camera_cfg
866            num_cams = args_cli.num_standard_cameras
867        elif ray_caster_camera_cfg is not None:
868            camera_name_prefix = "ray_caster_camera"
869            camera_creation_callable = create_ray_caster_camera_cfg
870            num_cams = args_cli.num_ray_caster_cameras
871
872        while (
873            all(cur <= max_thresh for cur, max_thresh in zip(cur_sys_util, max_sys_util_thresh))
874            and cur_num_cams <= max_num_cams
875        ):
876            cur_num_cams = num_cams + interval * autotune_iter
877            autotune_iter += 1
878
879            env = inject_cameras_into_task(
880                task=args_cli.task,
881                num_cams=cur_num_cams,
882                camera_name_prefix=camera_name_prefix,
883                camera_creation_callable=camera_creation_callable,
884                num_cameras_per_env=args_cli.task_num_cameras_per_env,
885            )
886            env.reset()
887            print(f"Testing with {cur_num_cams} {camera_name_prefix}")
888            analysis = run_simulator(
889                sim=None,
890                scene_entities=env.unwrapped.scene,
891                warm_start_length=args_cli.warm_start_length,
892                experiment_length=args_cli.experiment_length,
893                tiled_camera_data_types=args_cli.tiled_camera_data_types,
894                standard_camera_data_types=args_cli.standard_camera_data_types,
895                ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
896                convert_depth_to_camera_to_image_plane=args_cli.convert_depth_to_camera_to_image_plane,
897                max_cameras_per_env=args_cli.task_num_cameras_per_env,
898                env=env,
899            )
900
901            cur_sys_util = analysis["system_utilization_analytics"]
902            final_analysis = analysis
903            print("Triggering reset...")
904            env.close()
905            sim_utils.create_new_stage()
906        print("[INFO]: DONE! Feel free to CTRL + C Me ")
907        print(f"[INFO]: If you've made it this far, you can likely simulate {cur_num_cams} {camera_name_prefix}")
908        print("Keep in mind, this is without any training running on the GPU.")
909        print("Set lower utilization thresholds to account for training.")
910
911        if not args_cli.autotune:
912            print("[WARNING]: GPU Util Statistics only correct while autotuning, ignore above.")
913
914    # Log benchmark measurements
915    if final_analysis is not None:
916        timing = final_analysis["timing_analytics"]
917        sys_util = final_analysis["system_utilization_analytics"]
918
919        # Log timing measurements
920        benchmark.add_measurement(
921            "runtime",
922            measurement=SingleMeasurement(
923                name="Average Timestep Duration", value=timing["average_timestep_duration"] * 1000, unit="ms"
924            ),
925        )
926        benchmark.add_measurement(
927            "runtime",
928            measurement=SingleMeasurement(
929                name="Average Simulation Step Duration", value=timing["average_sim_step_duration"] * 1000, unit="ms"
930            ),
931        )
932        benchmark.add_measurement(
933            "runtime",
934            measurement=SingleMeasurement(
935                name="Total Simulation Time", value=timing["total_simulation_time"] * 1000, unit="ms"
936            ),
937        )
938
939        # Log system utilization
940        benchmark.add_measurement(
941            "runtime",
942            measurement=DictMeasurement(
943                name="System Utilization",
944                value={
945                    "cpu_percent": sys_util[0],
946                    "ram_percent": sys_util[1],
947                    "gpu_compute_percent": sys_util[2],
948                    "gpu_memory_percent": sys_util[3],
949                },
950            ),
951        )
952
953    # Finalize benchmark
954    benchmark.update_manual_recorders()
955    benchmark._finalize_impl()
956
957
958if __name__ == "__main__":
959    # run the main function
960    main()
961    # close sim app
962    simulation_app.close()

Possible Parameters#

First, run

python scripts/benchmarks/benchmark_cameras.py -h

to see all possible parameters you can vary with this utility.

See the command line parameters related to autotune for more information about automatically determining maximum camera count.

Compare Performance in Task Environments and Automatically Determine Task Max Camera Count#

Currently, tiled cameras are the most performant camera that can handle multiple dynamic objects.

For example, to see how your system could handle 100 tiled cameras in the cartpole environment, with 2 cameras per environment (so 50 environments total) only in RGB mode, run

python scripts/benchmarks/benchmark_cameras.py --task Isaac-Cartpole-v0 --num_tiled_cameras 100 --task_num_cameras_per_env 2 --tiled_camera_data_types rgb

If you have pynvml installed, (python -m pip install pynvml), you can also find the maximum number of cameras that you could run in the specified environment up to a certain performance threshold (specified by max CPU utilization percent, max RAM utilization percent, max GPU compute percent, and max GPU memory percent). For example, to find the maximum number of cameras you can run with cartpole, you could run:

python scripts/benchmarks/benchmark_cameras.py --task Isaac-Cartpole-v0 --num_tiled_cameras 100 --task_num_cameras_per_env 2 --tiled_camera_data_types rgb --autotune --autotune_max_percentage_util 100 80 50 50

Autotune may lead to the program crashing, which means that it tried to run too many cameras at once. However, the max percentage utilization parameter is meant to prevent this from happening.

The output of the benchmark doesn’t include the overhead of training the network, so consider decreasing the maximum utilization percentages to account for this overhead. The final output camera count is for all cameras, so to get the total number of environments, divide the output camera count by the number of cameras per environment.

Compare Camera Type and Performance (Without a Specified Task)#

This tool can also asses performance without a task environment. For example, to view 100 random objects with 2 standard cameras, one could run

python scripts/benchmarks/benchmark_cameras.py --height 100 --width 100 --num_standard_cameras 2 --standard_camera_data_types instance_segmentation_fast normals --num_objects 100 --experiment_length 100

If your system cannot handle this due to performance reasons, then the process will be killed. It’s recommended to monitor CPU/RAM utilization and GPU utilization while running this script, to get an idea of how many resources rendering the desired camera requires. In Ubuntu, you can use tools like htop and nvtop to live monitor resources while running this script, and in Windows, you can use the Task Manager.

If your system has a hard time handling the desired cameras, you can try the following

  • Switch to headless mode (supply --headless)

  • Ensure you are using the GPU pipeline not CPU!

  • If you aren’t using Tiled Cameras, switch to Tiled Cameras

  • Decrease camera resolution

  • Decrease how many data_types there are for each camera.

  • Decrease the number of cameras

  • Decrease the number of objects in the scene

If your system is able to handle the amount of cameras, then the time statistics will be printed to the terminal. After the simulations stops it can be closed with CTRL+C.