Find How Many/What Cameras You Should Train With#

Currently in Isaac Lab, there are several camera types; USD Cameras (standard), Tiled Cameras, and Ray Caster cameras. These camera types differ in functionality and performance. The benchmark_cameras.py script can be used to understand the difference in cameras types, as well to characterize their relative performance at different parameters such as camera quantity, image dimensions, and data types.

This utility is provided so that one easily can find the camera type/parameters that are the most performant while meeting the requirements of the user’s scenario. This utility also helps estimate the maximum number of cameras one can realistically run, assuming that one wants to maximize the number of environments while minimizing step time.

This utility can inject cameras into an existing task from the gym registry, which can be useful for benchmarking cameras in a specific scenario. Also, if you install pynvml, you can let this utility automatically find the maximum numbers of cameras that can run in your task environment up to a certain specified system resource utilization threshold (without training; taking zero actions at each timestep).

This guide accompanies the benchmark_cameras.py script in the source/standalone/benchmarks directory.

Code for benchmark_cameras.py
  1# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
  2# All rights reserved.
  3#
  4# SPDX-License-Identifier: BSD-3-Clause
  5
  6"""
  7This script might help you determine how many cameras your system can realistically run
  8at different desired settings.
  9
 10You can supply different task environments to inject cameras into, or just test a sample scene.
 11Additionally, you can automatically find the maximum amount of cameras you can run a task with
 12through the auto-tune functionality.
 13
 14.. code-block:: bash
 15
 16    # Usage with GUI
 17    ./isaaclab.sh -p source/standalone/benchmarks/benchmark_cameras.py -h
 18
 19    # Usage with headless
 20    ./isaaclab.sh -p source/standalone/benchmarks/benchmark_cameras.py -h --headless
 21
 22"""
 23
 24"""Launch Isaac Sim Simulator first."""
 25
 26import argparse
 27from collections.abc import Callable
 28
 29from omni.isaac.lab.app import AppLauncher
 30
 31# parse the arguments
 32args_cli = argparse.Namespace()
 33
 34parser = argparse.ArgumentParser(description="This script can help you benchmark how many cameras you could run.")
 35
 36"""
 37The following arguments only need to be supplied for when one wishes
 38to try injecting cameras into their environment, and automatically determining
 39the maximum camera count.
 40"""
 41parser.add_argument(
 42    "--task",
 43    type=str,
 44    default=None,
 45    required=False,
 46    help="Supply this argument to spawn cameras within an known manager-based task environment.",
 47)
 48
 49parser.add_argument(
 50    "--autotune",
 51    default=False,
 52    action="store_true",
 53    help=(
 54        "Autotuning is only supported for provided task environments."
 55        " Supply this argument to increase the number of environments until a desired threshold is reached."
 56        "Install pynvml in your environment; ./isaaclab.sh -m pip install pynvml"
 57    ),
 58)
 59
 60parser.add_argument(
 61    "--task_num_cameras_per_env",
 62    type=int,
 63    default=1,
 64    help="The number of cameras per environment to use when using a known task.",
 65)
 66
 67parser.add_argument(
 68    "--use_fabric", action="store_true", default=False, help="Enable fabric and use USD I/O operations."
 69)
 70
 71parser.add_argument(
 72    "--autotune_max_percentage_util",
 73    nargs="+",
 74    type=float,
 75    default=[100.0, 80.0, 80.0, 80.0],
 76    required=False,
 77    help=(
 78        "The system utilization percentage thresholds to reach before an autotune is finished. "
 79        "If any one of these limits are hit, the autotune stops."
 80        "Thresholds are, in order, maximum CPU percentage utilization,"
 81        "maximum RAM percentage utilization, maximum GPU compute percent utilization, "
 82        "amd maximum GPU memory utilization."
 83    ),
 84)
 85
 86parser.add_argument(
 87    "--autotune_max_camera_count", type=int, default=4096, help="The maximum amount of cameras allowed in an autotune."
 88)
 89
 90parser.add_argument(
 91    "--autotune_camera_count_interval",
 92    type=int,
 93    default=25,
 94    help=(
 95        "The number of cameras to try to add to the environment if the current camera count"
 96        " falls within permitted system resource utilization limits."
 97    ),
 98)
 99
100"""
101The following arguments are shared for when injecting cameras into a task environment,
102as well as when creating cameras independent of a task environment.
103"""
104
105parser.add_argument(
106    "--num_tiled_cameras",
107    type=int,
108    default=0,
109    required=False,
110    help="Number of tiled cameras to create. For autotuning, this is how many cameras to start with.",
111)
112
113parser.add_argument(
114    "--num_standard_cameras",
115    type=int,
116    default=0,
117    required=False,
118    help="Number of standard cameras to create. For autotuning, this is how many cameras to start with.",
119)
120
121parser.add_argument(
122    "--num_ray_caster_cameras",
123    type=int,
124    default=0,
125    required=False,
126    help="Number of ray caster cameras to create. For autotuning, this is how many cameras to start with.",
127)
128
129parser.add_argument(
130    "--tiled_camera_data_types",
131    nargs="+",
132    type=str,
133    default=["rgb", "depth"],
134    help="The data types rendered by the tiled camera",
135)
136
137parser.add_argument(
138    "--standard_camera_data_types",
139    nargs="+",
140    type=str,
141    default=["rgb", "distance_to_image_plane", "distance_to_camera"],
142    help="The data types rendered by the standard camera",
143)
144
145parser.add_argument(
146    "--ray_caster_camera_data_types",
147    nargs="+",
148    type=str,
149    default=["distance_to_image_plane"],
150    help="The data types rendered by the ray caster camera.",
151)
152
153parser.add_argument(
154    "--ray_caster_visible_mesh_prim_paths",
155    nargs="+",
156    type=str,
157    default=["/World/ground"],
158    help="WARNING: Ray Caster can currently only cast against a single, static, object",
159)
160
161parser.add_argument(
162    "--convert_depth_to_camera_to_image_plane",
163    action="store_true",
164    default=True,
165    help=(
166        "Enable undistorting from perspective view (distance to camera data_type)"
167        "to orthogonal view (distance to plane data_type) for depth."
168        "This is currently needed to create undisorted depth images/point cloud."
169    ),
170)
171
172parser.add_argument(
173    "--keep_raw_depth",
174    dest="convert_depth_to_camera_to_image_plane",
175    action="store_false",
176    help=(
177        "Disable undistorting from perspective view (distance to camera)"
178        "to orthogonal view (distance to plane data_type) for depth."
179    ),
180)
181
182parser.add_argument(
183    "--height",
184    type=int,
185    default=120,
186    required=False,
187    help="Height in pixels of cameras",
188)
189
190parser.add_argument(
191    "--width",
192    type=int,
193    default=140,
194    required=False,
195    help="Width in pixels of cameras",
196)
197
198parser.add_argument(
199    "--warm_start_length",
200    type=int,
201    default=3,
202    required=False,
203    help=(
204        "Number of steps to run the sim before starting benchmark."
205        "Needed to avoid blank images at the start of the simulation."
206    ),
207)
208
209parser.add_argument(
210    "--experiment_length",
211    type=int,
212    default=15,
213    required=False,
214    help="Number of steps to average over",
215)
216
217# This argument is only used when a task is not provided.
218parser.add_argument(
219    "--num_objects",
220    type=int,
221    default=10,
222    required=False,
223    help="Number of objects to spawn into the scene when not using a known task.",
224)
225
226
227AppLauncher.add_app_launcher_args(parser)
228args_cli = parser.parse_args()
229args_cli.enable_cameras = True
230
231if args_cli.autotune:
232    import pynvml
233
234if len(args_cli.ray_caster_visible_mesh_prim_paths) > 1:
235    print("[WARNING]: Ray Casting is only currently supported for a single, static object")
236# launch omniverse app
237app_launcher = AppLauncher(args_cli)
238simulation_app = app_launcher.app
239
240"""Rest everything follows."""
241
242import gymnasium as gym
243import numpy as np
244import random
245import time
246import torch
247
248import omni.isaac.core.utils.prims as prim_utils
249import psutil
250from omni.isaac.core.utils.stage import create_new_stage
251
252import omni.isaac.lab.sim as sim_utils
253from omni.isaac.lab.assets import RigidObject, RigidObjectCfg
254from omni.isaac.lab.scene.interactive_scene import InteractiveScene
255from omni.isaac.lab.sensors import (
256    Camera,
257    CameraCfg,
258    RayCasterCamera,
259    RayCasterCameraCfg,
260    TiledCamera,
261    TiledCameraCfg,
262    patterns,
263)
264from omni.isaac.lab.utils.math import orthogonalize_perspective_depth, unproject_depth
265
266from omni.isaac.lab_tasks.utils import load_cfg_from_registry
267
268"""
269Camera Creation
270"""
271
272
273def create_camera_base(
274    camera_cfg: type[CameraCfg | TiledCameraCfg],
275    num_cams: int,
276    data_types: list[str],
277    height: int,
278    width: int,
279    prim_path: str | None = None,
280    instantiate: bool = True,
281) -> Camera | TiledCamera | CameraCfg | TiledCameraCfg | None:
282    """Generalized function to create a camera or tiled camera sensor."""
283    # Determine prim prefix based on the camera class
284    name = camera_cfg.class_type.__name__
285
286    if instantiate:
287        # Create the necessary prims
288        for idx in range(num_cams):
289            prim_utils.create_prim(f"/World/{name}_{idx:02d}", "Xform")
290    if prim_path is None:
291        prim_path = f"/World/{name}_.*/{name}"
292    # If valid camera settings are provided, create the camera
293    if num_cams > 0 and len(data_types) > 0 and height > 0 and width > 0:
294        cfg = camera_cfg(
295            prim_path=prim_path,
296            update_period=0,
297            height=height,
298            width=width,
299            data_types=data_types,
300            spawn=sim_utils.PinholeCameraCfg(
301                focal_length=24, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 1e4)
302            ),
303        )
304        if instantiate:
305            return camera_cfg.class_type(cfg=cfg)
306        else:
307            return cfg
308    else:
309        return None
310
311
312def create_tiled_cameras(
313    num_cams: int = 2, data_types: list[str] | None = None, height: int = 100, width: int = 120
314) -> TiledCamera | None:
315    if data_types is None:
316        data_types = ["rgb", "depth"]
317    """Defines the tiled camera sensor to add to the scene."""
318    return create_camera_base(
319        camera_cfg=TiledCameraCfg,
320        num_cams=num_cams,
321        data_types=data_types,
322        height=height,
323        width=width,
324    )
325
326
327def create_cameras(
328    num_cams: int = 2, data_types: list[str] | None = None, height: int = 100, width: int = 120
329) -> Camera | None:
330    """Defines the Standard cameras."""
331    if data_types is None:
332        data_types = ["rgb", "depth"]
333    return create_camera_base(
334        camera_cfg=CameraCfg, num_cams=num_cams, data_types=data_types, height=height, width=width
335    )
336
337
338def create_ray_caster_cameras(
339    num_cams: int = 2,
340    data_types: list[str] = ["distance_to_image_plane"],
341    mesh_prim_paths: list[str] = ["/World/ground"],
342    height: int = 100,
343    width: int = 120,
344    prim_path: str = "/World/RayCasterCamera_.*/RayCaster",
345    instantiate: bool = True,
346) -> RayCasterCamera | RayCasterCameraCfg | None:
347    """Create the raycaster cameras; different configuration than Standard/Tiled camera"""
348    for idx in range(num_cams):
349        prim_utils.create_prim(f"/World/RayCasterCamera_{idx:02d}/RayCaster", "Xform")
350
351    if num_cams > 0 and len(data_types) > 0 and height > 0 and width > 0:
352        cam_cfg = RayCasterCameraCfg(
353            prim_path=prim_path,
354            mesh_prim_paths=mesh_prim_paths,
355            update_period=0,
356            offset=RayCasterCameraCfg.OffsetCfg(pos=(0.0, 0.0, 0.0), rot=(1.0, 0.0, 0.0, 0.0)),
357            data_types=data_types,
358            debug_vis=False,
359            pattern_cfg=patterns.PinholeCameraPatternCfg(
360                focal_length=24.0,
361                horizontal_aperture=20.955,
362                height=480,
363                width=640,
364            ),
365        )
366        if instantiate:
367            return RayCasterCamera(cfg=cam_cfg)
368        else:
369            return cam_cfg
370
371    else:
372        return None
373
374
375def create_tiled_camera_cfg(prim_path: str) -> TiledCameraCfg:
376    """Grab a simple tiled camera config for injecting into task environments."""
377    return create_camera_base(
378        TiledCameraCfg,
379        num_cams=args_cli.num_tiled_cameras,
380        data_types=args_cli.tiled_camera_data_types,
381        width=args_cli.width,
382        height=args_cli.height,
383        prim_path="{ENV_REGEX_NS}/" + prim_path,
384        instantiate=False,
385    )
386
387
388def create_standard_camera_cfg(prim_path: str) -> CameraCfg:
389    """Grab a simple standard camera config for injecting into task environments."""
390    return create_camera_base(
391        CameraCfg,
392        num_cams=args_cli.num_standard_cameras,
393        data_types=args_cli.standard_camera_data_types,
394        width=args_cli.width,
395        height=args_cli.height,
396        prim_path="{ENV_REGEX_NS}/" + prim_path,
397        instantiate=False,
398    )
399
400
401def create_ray_caster_camera_cfg(prim_path: str) -> RayCasterCameraCfg:
402    """Grab a simple ray caster config for injecting into task environments."""
403    return create_ray_caster_cameras(
404        num_cams=args_cli.num_ray_caster_cameras,
405        data_types=args_cli.ray_caster_camera_data_types,
406        width=args_cli.width,
407        height=args_cli.height,
408        prim_path="{ENV_REGEX_NS}/" + prim_path,
409    )
410
411
412"""
413Scene Creation
414"""
415
416
417def design_scene(
418    num_tiled_cams: int = 2,
419    num_standard_cams: int = 0,
420    num_ray_caster_cams: int = 0,
421    tiled_camera_data_types: list[str] | None = None,
422    standard_camera_data_types: list[str] | None = None,
423    ray_caster_camera_data_types: list[str] | None = None,
424    height: int = 100,
425    width: int = 200,
426    num_objects: int = 20,
427    mesh_prim_paths: list[str] = ["/World/ground"],
428) -> dict:
429    """Design the scene."""
430    if tiled_camera_data_types is None:
431        tiled_camera_data_types = ["rgb"]
432    if standard_camera_data_types is None:
433        standard_camera_data_types = ["rgb"]
434    if ray_caster_camera_data_types is None:
435        ray_caster_camera_data_types = ["distance_to_image_plane"]
436
437    # Populate scene
438    # -- Ground-plane
439    cfg = sim_utils.GroundPlaneCfg()
440    cfg.func("/World/ground", cfg)
441    # -- Lights
442    cfg = sim_utils.DistantLightCfg(intensity=3000.0, color=(0.75, 0.75, 0.75))
443    cfg.func("/World/Light", cfg)
444
445    # Create a dictionary for the scene entities
446    scene_entities = {}
447
448    # Xform to hold objects
449    prim_utils.create_prim("/World/Objects", "Xform")
450    # Random objects
451    for i in range(num_objects):
452        # sample random position
453        position = np.random.rand(3) - np.asarray([0.05, 0.05, -1.0])
454        position *= np.asarray([1.5, 1.5, 0.5])
455        # sample random color
456        color = (random.random(), random.random(), random.random())
457        # choose random prim type
458        prim_type = random.choice(["Cube", "Cone", "Cylinder"])
459        common_properties = {
460            "rigid_props": sim_utils.RigidBodyPropertiesCfg(),
461            "mass_props": sim_utils.MassPropertiesCfg(mass=5.0),
462            "collision_props": sim_utils.CollisionPropertiesCfg(),
463            "visual_material": sim_utils.PreviewSurfaceCfg(diffuse_color=color, metallic=0.5),
464            "semantic_tags": [("class", prim_type)],
465        }
466        if prim_type == "Cube":
467            shape_cfg = sim_utils.CuboidCfg(size=(0.25, 0.25, 0.25), **common_properties)
468        elif prim_type == "Cone":
469            shape_cfg = sim_utils.ConeCfg(radius=0.1, height=0.25, **common_properties)
470        elif prim_type == "Cylinder":
471            shape_cfg = sim_utils.CylinderCfg(radius=0.25, height=0.25, **common_properties)
472        # Rigid Object
473        obj_cfg = RigidObjectCfg(
474            prim_path=f"/World/Objects/Obj_{i:02d}",
475            spawn=shape_cfg,
476            init_state=RigidObjectCfg.InitialStateCfg(pos=position),
477        )
478        scene_entities[f"rigid_object{i}"] = RigidObject(cfg=obj_cfg)
479
480    # Sensors
481    standard_camera = create_cameras(
482        num_cams=num_standard_cams, data_types=standard_camera_data_types, height=height, width=width
483    )
484    tiled_camera = create_tiled_cameras(
485        num_cams=num_tiled_cams, data_types=tiled_camera_data_types, height=height, width=width
486    )
487    ray_caster_camera = create_ray_caster_cameras(
488        num_cams=num_ray_caster_cams,
489        data_types=ray_caster_camera_data_types,
490        mesh_prim_paths=mesh_prim_paths,
491        height=height,
492        width=width,
493    )
494    # return the scene information
495    if tiled_camera is not None:
496        scene_entities["tiled_camera"] = tiled_camera
497    if standard_camera is not None:
498        scene_entities["standard_camera"] = standard_camera
499    if ray_caster_camera is not None:
500        scene_entities["ray_caster_camera"] = ray_caster_camera
501    return scene_entities
502
503
504def inject_cameras_into_task(
505    task: str,
506    num_cams: int,
507    camera_name_prefix: str,
508    camera_creation_callable: Callable,
509    num_cameras_per_env: int = 1,
510) -> gym.Env:
511    """Loads the task, sticks cameras into the config, and creates the environment."""
512    cfg = load_cfg_from_registry(task, "env_cfg_entry_point")
513    cfg.sim.device = args_cli.device
514    cfg.sim.use_fabric = args_cli.use_fabric
515    scene_cfg = cfg.scene
516
517    num_envs = int(num_cams / num_cameras_per_env)
518    scene_cfg.num_envs = num_envs
519
520    for idx in range(num_cameras_per_env):
521        suffix = "" if idx == 0 else str(idx)
522        name = camera_name_prefix + suffix
523        setattr(scene_cfg, name, camera_creation_callable(name))
524    cfg.scene = scene_cfg
525    env = gym.make(task, cfg=cfg)
526    return env
527
528
529"""
530System diagnosis
531"""
532
533
534def get_utilization_percentages(reset: bool = False, max_values: list[float] = [0.0, 0.0, 0.0, 0.0]) -> list[float]:
535    """Get the maximum CPU, RAM, GPU utilization (processing), and
536    GPU memory usage percentages since the last time reset was true."""
537    if reset:
538        max_values[:] = [0, 0, 0, 0]  # Reset the max values
539
540    # CPU utilization
541    cpu_usage = psutil.cpu_percent(interval=0.1)
542    max_values[0] = max(max_values[0], cpu_usage)
543
544    # RAM utilization
545    memory_info = psutil.virtual_memory()
546    ram_usage = memory_info.percent
547    max_values[1] = max(max_values[1], ram_usage)
548
549    # GPU utilization using pynvml
550    if torch.cuda.is_available():
551
552        if args_cli.autotune:
553            pynvml.nvmlInit()  # Initialize NVML
554            for i in range(torch.cuda.device_count()):
555                handle = pynvml.nvmlDeviceGetHandleByIndex(i)
556
557                # GPU Utilization
558                gpu_utilization = pynvml.nvmlDeviceGetUtilizationRates(handle)
559                gpu_processing_utilization_percent = gpu_utilization.gpu  # GPU core utilization
560                max_values[2] = max(max_values[2], gpu_processing_utilization_percent)
561
562                # GPU Memory Usage
563                memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
564                gpu_memory_total = memory_info.total
565                gpu_memory_used = memory_info.used
566                gpu_memory_utilization_percent = (gpu_memory_used / gpu_memory_total) * 100
567                max_values[3] = max(max_values[3], gpu_memory_utilization_percent)
568
569            pynvml.nvmlShutdown()  # Shutdown NVML after usage
570    else:
571        gpu_processing_utilization_percent = None
572        gpu_memory_utilization_percent = None
573    return max_values
574
575
576"""
577Experiment
578"""
579
580
581def run_simulator(
582    sim: sim_utils.SimulationContext | None,
583    scene_entities: dict | InteractiveScene,
584    warm_start_length: int = 10,
585    experiment_length: int = 100,
586    tiled_camera_data_types: list[str] | None = None,
587    standard_camera_data_types: list[str] | None = None,
588    ray_caster_camera_data_types: list[str] | None = None,
589    depth_predicate: Callable = lambda x: "to" in x or x == "depth",
590    perspective_depth_predicate: Callable = lambda x: x == "distance_to_camera",
591    convert_depth_to_camera_to_image_plane: bool = True,
592    max_cameras_per_env: int = 1,
593    env: gym.Env | None = None,
594) -> dict:
595    """Run the simulator with all cameras, and return timing analytics. Visualize if desired."""
596
597    if tiled_camera_data_types is None:
598        tiled_camera_data_types = ["rgb"]
599    if standard_camera_data_types is None:
600        standard_camera_data_types = ["rgb"]
601    if ray_caster_camera_data_types is None:
602        ray_caster_camera_data_types = ["distance_to_image_plane"]
603
604    # Initialize camera lists
605    tiled_cameras = []
606    standard_cameras = []
607    ray_caster_cameras = []
608
609    # Dynamically extract cameras from the scene entities up to max_cameras_per_env
610    for i in range(max_cameras_per_env):
611        # Extract tiled cameras
612        tiled_camera_key = f"tiled_camera{i}" if i > 0 else "tiled_camera"
613        standard_camera_key = f"standard_camera{i}" if i > 0 else "standard_camera"
614        ray_caster_camera_key = f"ray_caster_camera{i}" if i > 0 else "ray_caster_camera"
615
616        try:  # if instead you checked ... if key is in scene_entities... # errors out always even if key present
617            tiled_cameras.append(scene_entities[tiled_camera_key])
618            standard_cameras.append(scene_entities[standard_camera_key])
619            ray_caster_cameras.append(scene_entities[ray_caster_camera_key])
620        except KeyError:
621            break
622
623    # Initialize camera counts
624    camera_lists = [tiled_cameras, standard_cameras, ray_caster_cameras]
625    camera_data_types = [tiled_camera_data_types, standard_camera_data_types, ray_caster_camera_data_types]
626    labels = ["tiled", "standard", "ray_caster"]
627
628    if sim is not None:
629        # Set camera world poses
630        for camera_list in camera_lists:
631            for camera in camera_list:
632                num_cameras = camera.data.intrinsic_matrices.size(0)
633                positions = torch.tensor([[2.5, 2.5, 2.5]], device=sim.device).repeat(num_cameras, 1)
634                targets = torch.tensor([[0.0, 0.0, 0.0]], device=sim.device).repeat(num_cameras, 1)
635                camera.set_world_poses_from_view(positions, targets)
636
637    # Initialize timing variables
638    timestep = 0
639    total_time = 0.0
640    valid_timesteps = 0
641    sim_step_time = 0.0
642
643    while simulation_app.is_running() and timestep < experiment_length:
644        print(f"On timestep {timestep} of {experiment_length}, with warm start of {warm_start_length}")
645        get_utilization_percentages()
646
647        # Measure the total simulation step time
648        step_start_time = time.time()
649
650        if sim is not None:
651            sim.step()
652
653        if env is not None:
654            with torch.inference_mode():
655                # compute zero actions
656                actions = torch.zeros(env.action_space.shape, device=env.unwrapped.device)
657                # apply actions
658                env.step(actions)
659
660        # Update cameras and process vision data within the simulation step
661        clouds = {}
662        images = {}
663        depth_images = {}
664
665        # Loop through all camera lists and their data_types
666        for camera_list, data_types, label in zip(camera_lists, camera_data_types, labels):
667            for cam_idx, camera in enumerate(camera_list):
668
669                if env is None:  # No env, need to step cams manually
670                    # Only update the camera if it hasn't been updated as part of scene_entities.update ...
671                    camera.update(dt=sim.get_physics_dt())
672
673                for data_type in data_types:
674                    data_label = f"{label}_{cam_idx}_{data_type}"
675
676                    if depth_predicate(data_type):  # is a depth image, want to create cloud
677                        depth = camera.data.output[data_type]
678                        depth_images[data_label + "_raw"] = depth
679                        if perspective_depth_predicate(data_type) and convert_depth_to_camera_to_image_plane:
680                            depth = orthogonalize_perspective_depth(
681                                camera.data.output[data_type], camera.data.intrinsic_matrices
682                            )
683                            depth_images[data_label + "_undistorted"] = depth
684
685                        pointcloud = unproject_depth(depth=depth, intrinsics=camera.data.intrinsic_matrices)
686                        clouds[data_label] = pointcloud
687                    else:  # rgb image, just save it
688                        image = camera.data.output[data_type]
689                        images[data_label] = image
690
691        # End timing for the step
692        step_end_time = time.time()
693        sim_step_time += step_end_time - step_start_time
694
695        if timestep > warm_start_length:
696            get_utilization_percentages(reset=True)
697            total_time += step_end_time - step_start_time
698            valid_timesteps += 1
699
700        timestep += 1
701
702    # Calculate average timings
703    if valid_timesteps > 0:
704        avg_timestep_duration = total_time / valid_timesteps
705        avg_sim_step_duration = sim_step_time / experiment_length
706    else:
707        avg_timestep_duration = 0.0
708        avg_sim_step_duration = 0.0
709
710    # Package timing analytics in a dictionary
711    timing_analytics = {
712        "average_timestep_duration": avg_timestep_duration,
713        "average_sim_step_duration": avg_sim_step_duration,
714        "total_simulation_time": sim_step_time,
715        "total_experiment_duration": sim_step_time,
716    }
717
718    system_utilization_analytics = get_utilization_percentages()
719
720    print("--- Benchmark Results ---")
721    print(f"Average timestep duration: {avg_timestep_duration:.6f} seconds")
722    print(f"Average simulation step duration: {avg_sim_step_duration:.6f} seconds")
723    print(f"Total simulation time: {sim_step_time:.6f} seconds")
724    print("\nSystem Utilization Statistics:")
725    print(
726        f"| CPU:{system_utilization_analytics[0]}% | "
727        f"RAM:{system_utilization_analytics[1]}% | "
728        f"GPU Compute:{system_utilization_analytics[2]}% | "
729        f" GPU Memory: {system_utilization_analytics[3]:.2f}% |"
730    )
731
732    return {"timing_analytics": timing_analytics, "system_utilization_analytics": system_utilization_analytics}
733
734
735def main():
736    """Main function."""
737    # Load simulation context
738    if args_cli.num_tiled_cameras + args_cli.num_standard_cameras + args_cli.num_ray_caster_cameras <= 0:
739        raise ValueError("You must select at least one camera.")
740    if (
741        (args_cli.num_tiled_cameras > 0 and args_cli.num_standard_cameras > 0)
742        or (args_cli.num_ray_caster_cameras > 0 and args_cli.num_standard_cameras > 0)
743        or (args_cli.num_ray_caster_cameras > 0 and args_cli.num_tiled_cameras > 0)
744    ):
745        print("[WARNING]: You have elected to use more than one camera type.")
746        print("[WARNING]: For a benchmark to be meaningful, use ONLY ONE camera type at a time.")
747        print(
748            "[WARNING]: For example, if num_tiled_cameras=100, for a meaningful benchmark,"
749            "num_standard_cameras should be 0, and num_ray_caster_cameras should be 0"
750        )
751        raise ValueError("Benchmark one camera at a time.")
752
753    print("[INFO]: Designing the scene")
754    if args_cli.task is None:
755        print("[INFO]: No task environment provided, creating random scene.")
756        sim_cfg = sim_utils.SimulationCfg(device=args_cli.device)
757        sim = sim_utils.SimulationContext(sim_cfg)
758        # Set main camera
759        sim.set_camera_view([2.5, 2.5, 2.5], [0.0, 0.0, 0.0])
760        scene_entities = design_scene(
761            num_tiled_cams=args_cli.num_tiled_cameras,
762            num_standard_cams=args_cli.num_standard_cameras,
763            num_ray_caster_cams=args_cli.num_ray_caster_cameras,
764            tiled_camera_data_types=args_cli.tiled_camera_data_types,
765            standard_camera_data_types=args_cli.standard_camera_data_types,
766            ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
767            height=args_cli.height,
768            width=args_cli.width,
769            num_objects=args_cli.num_objects,
770            mesh_prim_paths=args_cli.ray_caster_visible_mesh_prim_paths,
771        )
772        # Play simulator
773        sim.reset()
774        # Now we are ready!
775        print("[INFO]: Setup complete...")
776        # Run simulator
777        run_simulator(
778            sim=sim,
779            scene_entities=scene_entities,
780            warm_start_length=args_cli.warm_start_length,
781            experiment_length=args_cli.experiment_length,
782            tiled_camera_data_types=args_cli.tiled_camera_data_types,
783            standard_camera_data_types=args_cli.standard_camera_data_types,
784            ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
785            convert_depth_to_camera_to_image_plane=args_cli.convert_depth_to_camera_to_image_plane,
786        )
787    else:
788        print("[INFO]: Using known task environment, injecting cameras.")
789        autotune_iter = 0
790        max_sys_util_thresh = [0.0, 0.0, 0.0]
791        max_num_cams = max(args_cli.num_tiled_cameras, args_cli.num_standard_cameras, args_cli.num_ray_caster_cameras)
792        cur_num_cams = max_num_cams
793        cur_sys_util = max_sys_util_thresh
794        interval = args_cli.autotune_camera_count_interval
795
796        if args_cli.autotune:
797            max_sys_util_thresh = args_cli.autotune_max_percentage_util
798            max_num_cams = args_cli.autotune_max_camera_count
799            print("[INFO]: Auto tuning until any of the following threshold are met")
800            print(f"|CPU: {max_sys_util_thresh[0]}% | RAM {max_sys_util_thresh[1]}% | GPU: {max_sys_util_thresh[2]}% |")
801            print(f"[INFO]: Maximum number of cameras allowed: {max_num_cams}")
802        # Determine which camera is being tested...
803        tiled_camera_cfg = create_tiled_camera_cfg("tiled_camera")
804        standard_camera_cfg = create_standard_camera_cfg("standard_camera")
805        ray_caster_camera_cfg = create_ray_caster_camera_cfg("ray_caster_camera")
806        camera_name_prefix = ""
807        camera_creation_callable = None
808        num_cams = 0
809        if tiled_camera_cfg is not None:
810            camera_name_prefix = "tiled_camera"
811            camera_creation_callable = create_tiled_camera_cfg
812            num_cams = args_cli.num_tiled_cameras
813        elif standard_camera_cfg is not None:
814            camera_name_prefix = "standard_camera"
815            camera_creation_callable = create_standard_camera_cfg
816            num_cams = args_cli.num_standard_cameras
817        elif ray_caster_camera_cfg is not None:
818            camera_name_prefix = "ray_caster_camera"
819            camera_creation_callable = create_ray_caster_camera_cfg
820            num_cams = args_cli.num_ray_caster_cameras
821
822        while (
823            all(cur <= max_thresh for cur, max_thresh in zip(cur_sys_util, max_sys_util_thresh))
824            and cur_num_cams <= max_num_cams
825        ):
826            cur_num_cams = num_cams + interval * autotune_iter
827            autotune_iter += 1
828
829            env = inject_cameras_into_task(
830                task=args_cli.task,
831                num_cams=cur_num_cams,
832                camera_name_prefix=camera_name_prefix,
833                camera_creation_callable=camera_creation_callable,
834                num_cameras_per_env=args_cli.task_num_cameras_per_env,
835            )
836            env.reset()
837            print(f"Testing with {cur_num_cams} {camera_name_prefix}")
838            analysis = run_simulator(
839                sim=None,
840                scene_entities=env.unwrapped.scene,
841                warm_start_length=args_cli.warm_start_length,
842                experiment_length=args_cli.experiment_length,
843                tiled_camera_data_types=args_cli.tiled_camera_data_types,
844                standard_camera_data_types=args_cli.standard_camera_data_types,
845                ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
846                convert_depth_to_camera_to_image_plane=args_cli.convert_depth_to_camera_to_image_plane,
847                max_cameras_per_env=args_cli.task_num_cameras_per_env,
848                env=env,
849            )
850
851            cur_sys_util = analysis["system_utilization_analytics"]
852            print("Triggering reset...")
853            env.close()
854            create_new_stage()
855        print("[INFO]: DONE! Feel free to CTRL + C Me ")
856        print(f"[INFO]: If you've made it this far, you can likely simulate {cur_num_cams} {camera_name_prefix}")
857        print("Keep in mind, this is without any training running on the GPU.")
858        print("Set lower utilization thresholds to account for training.")
859
860        if not args_cli.autotune:
861            print("[WARNING]: GPU Util Statistics only correct while autotuning, ignore above.")
862
863
864if __name__ == "__main__":
865    # run the main function
866    main()
867    # close sim app
868    simulation_app.close()

Possible Parameters#

First, run

./isaaclab.sh -p source/standalone/benchmarks/benchmark_cameras.py -h

to see all possible parameters you can vary with this utility.

See the command line parameters related to autotune for more information about automatically determining maximum camera count.

Compare Performance in Task Environments and Automatically Determine Task Max Camera Count#

Currently, tiled cameras are the most performant camera that can handle multiple dynamic objects.

For example, to see how your system could handle 100 tiled cameras in the cartpole environment, with 2 cameras per environment (so 50 environments total) only in RGB mode, run

./isaaclab.sh -p source/standalone/benchmarks/benchmark_cameras.py \
--task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
--task_num_cameras_per_env 2 \
--tiled_camera_data_types rgb

If you have pynvml installed, (./isaaclab.sh -p -m pip install pynvml), you can also find the maximum number of cameras that you could run in the specified environment up to a certain performance threshold (specified by max CPU utilization percent, max RAM utilization percent, max GPU compute percent, and max GPU memory percent). For example, to find the maximum number of cameras you can run with cartpole, you could run:

./isaaclab.sh -p source/standalone/benchmarks/benchmark_cameras.py \
--task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
--task_num_cameras_per_env 2 \
--tiled_camera_data_types rgb --autotune \
--autotune_max_percentage_util 100 80 50 50

Autotune may lead to the program crashing, which means that it tried to run too many cameras at once. However, the max percentage utilization parameter is meant to prevent this from happening.

The output of the benchmark doesn’t include the overhead of training the network, so consider decreasing the maximum utilization percentages to account for this overhead. The final output camera count is for all cameras, so to get the total number of environments, divide the output camera count by the number of cameras per environment.

Compare Camera Type and Performance (Without a Specified Task)#

This tool can also asses performance without a task environment. For example, to view 100 random objects with 2 standard cameras, one could run

./isaaclab.sh -p source/standalone/benchmarks/benchmark_cameras.py \
--height 100 --width 100 --num_standard_cameras 2 \
--standard_camera_data_types instance_segmentation_fast normals --num_objects 100 \
--experiment_length 100

If your system cannot handle this due to performance reasons, then the process will be killed. It’s recommended to monitor CPU/RAM utilization and GPU utilization while running this script, to get an idea of how many resources rendering the desired camera requires. In Ubuntu, you can use tools like htop and nvtop to live monitor resources while running this script, and in Windows, you can use the Task Manager.

If your system has a hard time handling the desired cameras, you can try the following

  • Switch to headless mode (supply --headless)

  • Ensure you are using the GPU pipeline not CPU!

  • If you aren’t using Tiled Cameras, switch to Tiled Cameras

  • Decrease camera resolution

  • Decrease how many data_types there are for each camera.

  • Decrease the number of cameras

  • Decrease the number of objects in the scene

If your system is able to handle the amount of cameras, then the time statistics will be printed to the terminal. After the simulations stops it can be closed with CTRL+C.