Find How Many/What Cameras You Should Train With#

Currently in Isaac Lab, there are several camera types; USD Cameras (standard), Tiled Cameras, and Ray Caster cameras. These camera types differ in functionality and performance. The benchmark_cameras.py script can be used to understand the difference in cameras types, as well to characterize their relative performance at different parameters such as camera quantity, image dimensions, and data types.

This utility is provided so that one easily can find the camera type/parameters that are the most performant while meeting the requirements of the user’s scenario. This utility also helps estimate the maximum number of cameras one can realistically run, assuming that one wants to maximize the number of environments while minimizing step time.

This utility can inject cameras into an existing task from the gym registry, which can be useful for benchmarking cameras in a specific scenario. Also, if you install pynvml, you can let this utility automatically find the maximum numbers of cameras that can run in your task environment up to a certain specified system resource utilization threshold (without training; taking zero actions at each timestep).

This guide accompanies the benchmark_cameras.py script in the IsaacLab/source/standalone/tutorials/04_sensors directory.

Code for benchmark_cameras.py
  1# Copyright (c) 2022-2024, The Isaac Lab Project Developers.
  2# All rights reserved.
  3#
  4# SPDX-License-Identifier: BSD-3-Clause
  5
  6"""
  7This script might help you determine how many cameras your system can realistically run
  8at different desired settings. You can supply different task environments
  9to inject cameras into, or just test a sample scene. Additionally,
 10you can automatically find the maximum amount of cameras you can run a task with through the
 11autotune functionality.
 12
 13.. code-block:: bash
 14
 15    # Usage with GUI
 16    ./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py -h
 17
 18    # Usage with headless
 19    ./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py -h --headless
 20
 21"""
 22
 23"""Launch Isaac Sim Simulator first."""
 24
 25import argparse
 26from collections.abc import Callable
 27
 28from omni.isaac.lab.app import AppLauncher
 29
 30# parse the arguments
 31args_cli = argparse.Namespace()
 32
 33parser = argparse.ArgumentParser(description="This script can help you benchmark how many cameras you could run.")
 34
 35"""
 36The following arguments only need to be supplied for when one wishes
 37to try injecting cameras into their environment, and automatically determining
 38the maximum camera count.
 39"""
 40parser.add_argument(
 41    "--task",
 42    type=str,
 43    default=None,
 44    required=False,
 45    help="Supply this argument to spawn cameras within an known manager-based task environment.",
 46)
 47
 48parser.add_argument(
 49    "--autotune",
 50    default=False,
 51    action="store_true",
 52    help=(
 53        "Autotuning is only supported for provided task environments."
 54        " Supply this argument to increase the number of environments until a desired threshold is reached."
 55        "Install pynvml in your environment; ./isaaclab.sh -m pip install pynvml"
 56    ),
 57)
 58
 59parser.add_argument(
 60    "--task_num_cameras_per_env",
 61    type=int,
 62    default=1,
 63    help="The number of cameras per environment to use when using a known task.",
 64)
 65
 66parser.add_argument(
 67    "--use_fabric", action="store_true", default=False, help="Enable fabric and use USD I/O operations."
 68)
 69
 70parser.add_argument(
 71    "--autotune_max_percentage_util",
 72    nargs="+",
 73    type=float,
 74    default=[100.0, 80.0, 80.0, 80.0],
 75    required=False,
 76    help=(
 77        "The system utilization percentage thresholds to reach before an autotune is finished. "
 78        "If any one of these limits are hit, the autotune stops."
 79        "Thresholds are, in order, maximum CPU percentage utilization,"
 80        "maximum RAM percentage utilization, maximum GPU compute percent utilization, "
 81        "amd maximum GPU memory utilization."
 82    ),
 83)
 84
 85parser.add_argument(
 86    "--autotune_max_camera_count", type=int, default=4096, help="The maximum amount of cameras allowed in an autotune."
 87)
 88
 89parser.add_argument(
 90    "--autotune_camera_count_interval",
 91    type=int,
 92    default=25,
 93    help=(
 94        "The number of cameras to try to add to the environment if the current camera count"
 95        " falls within permitted system resource utilization limits."
 96    ),
 97)
 98
 99"""
100The following arguments are shared for when injecting cameras into a task environment,
101as well as when creating cameras independent of a task environment.
102"""
103
104parser.add_argument(
105    "--num_tiled_cameras",
106    type=int,
107    default=0,
108    required=False,
109    help="Number of tiled cameras to create. For autotuning, this is how many cameras to start with.",
110)
111
112parser.add_argument(
113    "--num_standard_cameras",
114    type=int,
115    default=0,
116    required=False,
117    help="Number of standard cameras to create. For autotuning, this is how many cameras to start with.",
118)
119
120parser.add_argument(
121    "--num_ray_caster_cameras",
122    type=int,
123    default=0,
124    required=False,
125    help="Number of ray caster cameras to create. For autotuning, this is how many cameras to start with.",
126)
127
128parser.add_argument(
129    "--tiled_camera_data_types",
130    nargs="+",
131    type=str,
132    default=["rgb", "depth"],
133    help="The data types rendered by the tiled camera",
134)
135
136parser.add_argument(
137    "--standard_camera_data_types",
138    nargs="+",
139    type=str,
140    default=["rgb", "distance_to_image_plane", "distance_to_camera"],
141    help="The data types rendered by the standard camera",
142)
143
144parser.add_argument(
145    "--ray_caster_camera_data_types",
146    nargs="+",
147    type=str,
148    default=["distance_to_image_plane"],
149    help="The data types rendered by the ray caster camera.",
150)
151
152parser.add_argument(
153    "--ray_caster_visible_mesh_prim_paths",
154    nargs="+",
155    type=str,
156    default=["/World/ground"],
157    help="WARNING: Ray Caster can currently only cast against a single, static, object",
158)
159
160parser.add_argument(
161    "--convert_depth_to_camera_to_image_plane",
162    action="store_true",
163    default=True,
164    help=(
165        "Enable undistorting from perspective view (distance to camera data_type)"
166        "to orthogonal view (distance to plane data_type) for depth."
167        "This is currently needed to create undisorted depth images/point cloud."
168    ),
169)
170
171parser.add_argument(
172    "--keep_raw_depth",
173    dest="convert_depth_to_camera_to_image_plane",
174    action="store_false",
175    help=(
176        "Disable undistorting from perspective view (distance to camera)"
177        "to orthogonal view (distance to plane data_type) for depth."
178    ),
179)
180
181parser.add_argument(
182    "--height",
183    type=int,
184    default=120,
185    required=False,
186    help="Height in pixels of cameras",
187)
188
189parser.add_argument(
190    "--width",
191    type=int,
192    default=140,
193    required=False,
194    help="Width in pixels of cameras",
195)
196
197parser.add_argument(
198    "--warm_start_length",
199    type=int,
200    default=3,
201    required=False,
202    help=(
203        "Number of steps to run the sim before starting benchmark."
204        "Needed to avoid blank images at the start of the simulation."
205    ),
206)
207
208parser.add_argument(
209    "--experiment_length",
210    type=int,
211    default=15,
212    required=False,
213    help="Number of steps to average over",
214)
215
216# This argument is only used when a task is not provided.
217parser.add_argument(
218    "--num_objects",
219    type=int,
220    default=10,
221    required=False,
222    help="Number of objects to spawn into the scene when not using a known task.",
223)
224
225
226AppLauncher.add_app_launcher_args(parser)
227args_cli = parser.parse_args()
228args_cli.enable_cameras = True
229
230if args_cli.autotune:
231    import pynvml
232
233if len(args_cli.ray_caster_visible_mesh_prim_paths) > 1:
234    print("[WARNING]: Ray Casting is only currently supported for a single, static object")
235# launch omniverse app
236app_launcher = AppLauncher(args_cli)
237simulation_app = app_launcher.app
238
239"""Rest everything follows."""
240
241import gymnasium as gym
242import numpy as np
243import random
244import time
245import torch
246
247import omni.isaac.core.utils.prims as prim_utils
248import psutil
249from omni.isaac.core.utils.stage import create_new_stage
250
251import omni.isaac.lab.sim as sim_utils
252from omni.isaac.lab.assets import RigidObject, RigidObjectCfg
253from omni.isaac.lab.scene.interactive_scene import InteractiveScene
254from omni.isaac.lab.sensors import (
255    Camera,
256    CameraCfg,
257    RayCasterCamera,
258    RayCasterCameraCfg,
259    TiledCamera,
260    TiledCameraCfg,
261    patterns,
262)
263from omni.isaac.lab.utils.math import convert_perspective_depth_to_orthogonal_depth, unproject_depth
264
265from omni.isaac.lab_tasks.utils import load_cfg_from_registry
266
267"""
268Camera Creation
269"""
270
271
272def create_camera_base(
273    camera_cfg: type[CameraCfg | TiledCameraCfg],
274    num_cams: int,
275    data_types: list[str],
276    height: int,
277    width: int,
278    prim_path: str | None = None,
279    instantiate: bool = True,
280) -> Camera | TiledCamera | CameraCfg | TiledCameraCfg | None:
281    """Generalized function to create a camera or tiled camera sensor."""
282    # Determine prim prefix based on the camera class
283    name = camera_cfg.class_type.__name__
284
285    if instantiate:
286        # Create the necessary prims
287        for idx in range(num_cams):
288            prim_utils.create_prim(f"/World/{name}_{idx:02d}", "Xform")
289    if prim_path is None:
290        prim_path = f"/World/{name}_.*/{name}"
291    # If valid camera settings are provided, create the camera
292    if num_cams > 0 and len(data_types) > 0 and height > 0 and width > 0:
293        cfg = camera_cfg(
294            prim_path=prim_path,
295            update_period=0,
296            height=height,
297            width=width,
298            data_types=data_types,
299            spawn=sim_utils.PinholeCameraCfg(
300                focal_length=24, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 1e4)
301            ),
302        )
303        if instantiate:
304            return camera_cfg.class_type(cfg=cfg)
305        else:
306            return cfg
307    else:
308        return None
309
310
311def create_tiled_cameras(
312    num_cams: int = 2, data_types: list[str] | None = None, height: int = 100, width: int = 120
313) -> TiledCamera | None:
314    if data_types is None:
315        data_types = ["rgb", "depth"]
316    """Defines the tiled camera sensor to add to the scene."""
317    return create_camera_base(
318        camera_cfg=TiledCameraCfg,
319        num_cams=num_cams,
320        data_types=data_types,
321        height=height,
322        width=width,
323    )
324
325
326def create_cameras(
327    num_cams: int = 2, data_types: list[str] | None = None, height: int = 100, width: int = 120
328) -> Camera | None:
329    """Defines the Standard cameras."""
330    if data_types is None:
331        data_types = ["rgb", "depth"]
332    return create_camera_base(
333        camera_cfg=CameraCfg, num_cams=num_cams, data_types=data_types, height=height, width=width
334    )
335
336
337def create_ray_caster_cameras(
338    num_cams: int = 2,
339    data_types: list[str] = ["distance_to_image_plane"],
340    mesh_prim_paths: list[str] = ["/World/ground"],
341    height: int = 100,
342    width: int = 120,
343    prim_path: str = "/World/RayCasterCamera_.*/RayCaster",
344    instantiate: bool = True,
345) -> RayCasterCamera | RayCasterCameraCfg | None:
346    """Create the raycaster cameras; different configuration than Standard/Tiled camera"""
347    for idx in range(num_cams):
348        prim_utils.create_prim(f"/World/RayCasterCamera_{idx:02d}/RayCaster", "Xform")
349
350    if num_cams > 0 and len(data_types) > 0 and height > 0 and width > 0:
351        cam_cfg = RayCasterCameraCfg(
352            prim_path=prim_path,
353            mesh_prim_paths=mesh_prim_paths,
354            update_period=0,
355            offset=RayCasterCameraCfg.OffsetCfg(pos=(0.0, 0.0, 0.0), rot=(1.0, 0.0, 0.0, 0.0)),
356            data_types=data_types,
357            debug_vis=False,
358            pattern_cfg=patterns.PinholeCameraPatternCfg(
359                focal_length=24.0,
360                horizontal_aperture=20.955,
361                height=480,
362                width=640,
363            ),
364        )
365        if instantiate:
366            return RayCasterCamera(cfg=cam_cfg)
367        else:
368            return cam_cfg
369
370    else:
371        return None
372
373
374def create_tiled_camera_cfg(prim_path: str) -> TiledCameraCfg:
375    """Grab a simple tiled camera config for injecting into task environments."""
376    return create_camera_base(
377        TiledCameraCfg,
378        num_cams=args_cli.num_tiled_cameras,
379        data_types=args_cli.tiled_camera_data_types,
380        width=args_cli.width,
381        height=args_cli.height,
382        prim_path="{ENV_REGEX_NS}/" + prim_path,
383        instantiate=False,
384    )
385
386
387def create_standard_camera_cfg(prim_path: str) -> CameraCfg:
388    """Grab a simple standard camera config for injecting into task environments."""
389    return create_camera_base(
390        CameraCfg,
391        num_cams=args_cli.num_standard_cameras,
392        data_types=args_cli.standard_camera_data_types,
393        width=args_cli.width,
394        height=args_cli.height,
395        prim_path="{ENV_REGEX_NS}/" + prim_path,
396        instantiate=False,
397    )
398
399
400def create_ray_caster_camera_cfg(prim_path: str) -> RayCasterCameraCfg:
401    """Grab a simple ray caster config for injecting into task environments."""
402    return create_ray_caster_cameras(
403        num_cams=args_cli.num_ray_caster_cameras,
404        data_types=args_cli.ray_caster_camera_data_types,
405        width=args_cli.width,
406        height=args_cli.height,
407        prim_path="{ENV_REGEX_NS}/" + prim_path,
408    )
409
410
411"""
412Scene Creation
413"""
414
415
416def design_scene(
417    num_tiled_cams: int = 2,
418    num_standard_cams: int = 0,
419    num_ray_caster_cams: int = 0,
420    tiled_camera_data_types: list[str] | None = None,
421    standard_camera_data_types: list[str] | None = None,
422    ray_caster_camera_data_types: list[str] | None = None,
423    height: int = 100,
424    width: int = 200,
425    num_objects: int = 20,
426    mesh_prim_paths: list[str] = ["/World/ground"],
427) -> dict:
428    """Design the scene."""
429    if tiled_camera_data_types is None:
430        tiled_camera_data_types = ["rgb"]
431    if standard_camera_data_types is None:
432        standard_camera_data_types = ["rgb"]
433    if ray_caster_camera_data_types is None:
434        ray_caster_camera_data_types = ["distance_to_image_plane"]
435
436    # Populate scene
437    # -- Ground-plane
438    cfg = sim_utils.GroundPlaneCfg()
439    cfg.func("/World/ground", cfg)
440    # -- Lights
441    cfg = sim_utils.DistantLightCfg(intensity=3000.0, color=(0.75, 0.75, 0.75))
442    cfg.func("/World/Light", cfg)
443
444    # Create a dictionary for the scene entities
445    scene_entities = {}
446
447    # Xform to hold objects
448    prim_utils.create_prim("/World/Objects", "Xform")
449    # Random objects
450    for i in range(num_objects):
451        # sample random position
452        position = np.random.rand(3) - np.asarray([0.05, 0.05, -1.0])
453        position *= np.asarray([1.5, 1.5, 0.5])
454        # sample random color
455        color = (random.random(), random.random(), random.random())
456        # choose random prim type
457        prim_type = random.choice(["Cube", "Cone", "Cylinder"])
458        common_properties = {
459            "rigid_props": sim_utils.RigidBodyPropertiesCfg(),
460            "mass_props": sim_utils.MassPropertiesCfg(mass=5.0),
461            "collision_props": sim_utils.CollisionPropertiesCfg(),
462            "visual_material": sim_utils.PreviewSurfaceCfg(diffuse_color=color, metallic=0.5),
463            "semantic_tags": [("class", prim_type)],
464        }
465        if prim_type == "Cube":
466            shape_cfg = sim_utils.CuboidCfg(size=(0.25, 0.25, 0.25), **common_properties)
467        elif prim_type == "Cone":
468            shape_cfg = sim_utils.ConeCfg(radius=0.1, height=0.25, **common_properties)
469        elif prim_type == "Cylinder":
470            shape_cfg = sim_utils.CylinderCfg(radius=0.25, height=0.25, **common_properties)
471        # Rigid Object
472        obj_cfg = RigidObjectCfg(
473            prim_path=f"/World/Objects/Obj_{i:02d}",
474            spawn=shape_cfg,
475            init_state=RigidObjectCfg.InitialStateCfg(pos=position),
476        )
477        scene_entities[f"rigid_object{i}"] = RigidObject(cfg=obj_cfg)
478
479    # Sensors
480    standard_camera = create_cameras(
481        num_cams=num_standard_cams, data_types=standard_camera_data_types, height=height, width=width
482    )
483    tiled_camera = create_tiled_cameras(
484        num_cams=num_tiled_cams, data_types=tiled_camera_data_types, height=height, width=width
485    )
486    ray_caster_camera = create_ray_caster_cameras(
487        num_cams=num_ray_caster_cams,
488        data_types=ray_caster_camera_data_types,
489        mesh_prim_paths=mesh_prim_paths,
490        height=height,
491        width=width,
492    )
493    # return the scene information
494    if tiled_camera is not None:
495        scene_entities["tiled_camera"] = tiled_camera
496    if standard_camera is not None:
497        scene_entities["standard_camera"] = standard_camera
498    if ray_caster_camera is not None:
499        scene_entities["ray_caster_camera"] = ray_caster_camera
500    return scene_entities
501
502
503def inject_cameras_into_task(
504    task: str,
505    num_cams: int,
506    camera_name_prefix: str,
507    camera_creation_callable: Callable,
508    num_cameras_per_env: int = 1,
509) -> gym.Env:
510    """Loads the task, sticks cameras into the config, and creates the environment."""
511    cfg = load_cfg_from_registry(task, "env_cfg_entry_point")
512    cfg.sim.device = args_cli.device
513    cfg.sim.use_fabric = args_cli.use_fabric
514    scene_cfg = cfg.scene
515
516    num_envs = int(num_cams / num_cameras_per_env)
517    scene_cfg.num_envs = num_envs
518
519    for idx in range(num_cameras_per_env):
520        suffix = "" if idx == 0 else str(idx)
521        name = camera_name_prefix + suffix
522        setattr(scene_cfg, name, camera_creation_callable(name))
523    cfg.scene = scene_cfg
524    env = gym.make(task, cfg=cfg)
525    return env
526
527
528"""
529System diagnosis
530"""
531
532
533def get_utilization_percentages(reset: bool = False, max_values: list[float] = [0.0, 0.0, 0.0, 0.0]) -> list[float]:
534    """Get the maximum CPU, RAM, GPU utilization (processing), and
535    GPU memory usage percentages since the last time reset was true."""
536    if reset:
537        max_values[:] = [0, 0, 0, 0]  # Reset the max values
538
539    # CPU utilization
540    cpu_usage = psutil.cpu_percent(interval=0.1)
541    max_values[0] = max(max_values[0], cpu_usage)
542
543    # RAM utilization
544    memory_info = psutil.virtual_memory()
545    ram_usage = memory_info.percent
546    max_values[1] = max(max_values[1], ram_usage)
547
548    # GPU utilization using pynvml
549    if torch.cuda.is_available():
550
551        if args_cli.autotune:
552            pynvml.nvmlInit()  # Initialize NVML
553            for i in range(torch.cuda.device_count()):
554                handle = pynvml.nvmlDeviceGetHandleByIndex(i)
555
556                # GPU Utilization
557                gpu_utilization = pynvml.nvmlDeviceGetUtilizationRates(handle)
558                gpu_processing_utilization_percent = gpu_utilization.gpu  # GPU core utilization
559                max_values[2] = max(max_values[2], gpu_processing_utilization_percent)
560
561                # GPU Memory Usage
562                memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
563                gpu_memory_total = memory_info.total
564                gpu_memory_used = memory_info.used
565                gpu_memory_utilization_percent = (gpu_memory_used / gpu_memory_total) * 100
566                max_values[3] = max(max_values[3], gpu_memory_utilization_percent)
567
568            pynvml.nvmlShutdown()  # Shutdown NVML after usage
569    else:
570        gpu_processing_utilization_percent = None
571        gpu_memory_utilization_percent = None
572    return max_values
573
574
575"""
576Experiment
577"""
578
579
580def run_simulator(
581    sim: sim_utils.SimulationContext | None,
582    scene_entities: dict | InteractiveScene,
583    warm_start_length: int = 10,
584    experiment_length: int = 100,
585    tiled_camera_data_types: list[str] | None = None,
586    standard_camera_data_types: list[str] | None = None,
587    ray_caster_camera_data_types: list[str] | None = None,
588    depth_predicate: Callable = lambda x: "to" in x or x == "depth",
589    perspective_depth_predicate: Callable = lambda x: x == "distance_to_camera",
590    convert_depth_to_camera_to_image_plane: bool = True,
591    max_cameras_per_env: int = 1,
592    env: gym.Env | None = None,
593) -> dict:
594    """Run the simulator with all cameras, and return timing analytics. Visualize if desired."""
595
596    if tiled_camera_data_types is None:
597        tiled_camera_data_types = ["rgb"]
598    if standard_camera_data_types is None:
599        standard_camera_data_types = ["rgb"]
600    if ray_caster_camera_data_types is None:
601        ray_caster_camera_data_types = ["distance_to_image_plane"]
602
603    # Initialize camera lists
604    tiled_cameras = []
605    standard_cameras = []
606    ray_caster_cameras = []
607
608    # Dynamically extract cameras from the scene entities up to max_cameras_per_env
609    for i in range(max_cameras_per_env):
610        # Extract tiled cameras
611        tiled_camera_key = f"tiled_camera{i}" if i > 0 else "tiled_camera"
612        standard_camera_key = f"standard_camera{i}" if i > 0 else "standard_camera"
613        ray_caster_camera_key = f"ray_caster_camera{i}" if i > 0 else "ray_caster_camera"
614
615        try:  # if instead you checked ... if key is in scene_entities... # errors out always even if key present
616            tiled_cameras.append(scene_entities[tiled_camera_key])
617            standard_cameras.append(scene_entities[standard_camera_key])
618            ray_caster_cameras.append(scene_entities[ray_caster_camera_key])
619        except KeyError:
620            break
621
622    # Initialize camera counts
623    camera_lists = [tiled_cameras, standard_cameras, ray_caster_cameras]
624    camera_data_types = [tiled_camera_data_types, standard_camera_data_types, ray_caster_camera_data_types]
625    labels = ["tiled", "standard", "ray_caster"]
626
627    if sim is not None:
628        # Set camera world poses
629        for camera_list in camera_lists:
630            for camera in camera_list:
631                num_cameras = camera.data.intrinsic_matrices.size(0)
632                positions = torch.tensor([[2.5, 2.5, 2.5]], device=sim.device).repeat(num_cameras, 1)
633                targets = torch.tensor([[0.0, 0.0, 0.0]], device=sim.device).repeat(num_cameras, 1)
634                camera.set_world_poses_from_view(positions, targets)
635
636    # Initialize timing variables
637    timestep = 0
638    total_time = 0.0
639    valid_timesteps = 0
640    sim_step_time = 0.0
641
642    while simulation_app.is_running() and timestep < experiment_length:
643        print(f"On timestep {timestep} of {experiment_length}, with warm start of {warm_start_length}")
644        get_utilization_percentages()
645
646        # Measure the total simulation step time
647        step_start_time = time.time()
648
649        if sim is not None:
650            sim.step()
651
652        if env is not None:
653            with torch.inference_mode():
654                # compute zero actions
655                actions = torch.zeros(env.action_space.shape, device=env.unwrapped.device)
656                # apply actions
657                env.step(actions)
658
659        # Update cameras and process vision data within the simulation step
660        clouds = {}
661        images = {}
662        depth_images = {}
663
664        # Loop through all camera lists and their data_types
665        for camera_list, data_types, label in zip(camera_lists, camera_data_types, labels):
666            for cam_idx, camera in enumerate(camera_list):
667
668                if env is None:  # No env, need to step cams manually
669                    # Only update the camera if it hasn't been updated as part of scene_entities.update ...
670                    camera.update(dt=sim.get_physics_dt())
671
672                for data_type in data_types:
673                    data_label = f"{label}_{cam_idx}_{data_type}"
674
675                    if depth_predicate(data_type):  # is a depth image, want to create cloud
676                        depth = camera.data.output[data_type]
677                        depth_images[data_label + "_raw"] = depth
678                        if perspective_depth_predicate(data_type) and convert_depth_to_camera_to_image_plane:
679                            depth = convert_perspective_depth_to_orthogonal_depth(
680                                perspective_depth=camera.data.output[data_type],
681                                intrinsics=camera.data.intrinsic_matrices,
682                            )
683                            depth_images[data_label + "_undistorted"] = depth
684
685                        pointcloud = unproject_depth(depth=depth, intrinsics=camera.data.intrinsic_matrices)
686                        clouds[data_label] = pointcloud
687                    else:  # rgb image, just save it
688                        image = camera.data.output[data_type]
689                        images[data_label] = image
690
691        # End timing for the step
692        step_end_time = time.time()
693        sim_step_time += step_end_time - step_start_time
694
695        if timestep > warm_start_length:
696            get_utilization_percentages(reset=True)
697            total_time += step_end_time - step_start_time
698            valid_timesteps += 1
699
700        timestep += 1
701
702    # Calculate average timings
703    if valid_timesteps > 0:
704        avg_timestep_duration = total_time / valid_timesteps
705        avg_sim_step_duration = sim_step_time / experiment_length
706    else:
707        avg_timestep_duration = 0.0
708        avg_sim_step_duration = 0.0
709
710    # Package timing analytics in a dictionary
711    timing_analytics = {
712        "average_timestep_duration": avg_timestep_duration,
713        "average_sim_step_duration": avg_sim_step_duration,
714        "total_simulation_time": sim_step_time,
715        "total_experiment_duration": sim_step_time,
716    }
717
718    system_utilization_analytics = get_utilization_percentages()
719
720    print("--- Benchmark Results ---")
721    print(f"Average timestep duration: {avg_timestep_duration:.6f} seconds")
722    print(f"Average simulation step duration: {avg_sim_step_duration:.6f} seconds")
723    print(f"Total simulation time: {sim_step_time:.6f} seconds")
724    print("\nSystem Utilization Statistics:")
725    print(
726        f"| CPU:{system_utilization_analytics[0]}% | "
727        f"RAM:{system_utilization_analytics[1]}% | "
728        f"GPU Compute:{system_utilization_analytics[2]}% | "
729        f" GPU Memory: {system_utilization_analytics[3]:.2f}% |"
730    )
731
732    return {"timing_analytics": timing_analytics, "system_utilization_analytics": system_utilization_analytics}
733
734
735def main():
736    """Main function."""
737    # Load simulation context
738    if args_cli.num_tiled_cameras + args_cli.num_standard_cameras + args_cli.num_ray_caster_cameras <= 0:
739        raise ValueError("You must select at least one camera.")
740    if (
741        (args_cli.num_tiled_cameras > 0 and args_cli.num_standard_cameras > 0)
742        or (args_cli.num_ray_caster_cameras > 0 and args_cli.num_standard_cameras > 0)
743        or (args_cli.num_ray_caster_cameras > 0 and args_cli.num_tiled_cameras > 0)
744    ):
745        print("[WARNING]: You have elected to use more than one camera type.")
746        print("[WARNING]: For a benchmark to be meaningful, use ONLY ONE camera type at a time.")
747        print(
748            "[WARNING]: For example, if num_tiled_cameras=100, for a meaningful benchmark,"
749            "num_standard_cameras should be 0, and num_ray_caster_cameras should be 0"
750        )
751        raise ValueError("Benchmark one camera at a time.")
752
753    print("[INFO]: Designing the scene")
754    if args_cli.task is None:
755        print("[INFO]: No task environment provided, creating random scene.")
756        sim_cfg = sim_utils.SimulationCfg(device="cpu" if args_cli.cpu else "cuda")
757        sim = sim_utils.SimulationContext(sim_cfg)
758        # Set main camera
759        sim.set_camera_view([2.5, 2.5, 2.5], [0.0, 0.0, 0.0])
760        scene_entities = design_scene(
761            num_tiled_cams=args_cli.num_tiled_cameras,
762            num_standard_cams=args_cli.num_standard_cameras,
763            num_ray_caster_cams=args_cli.num_ray_caster_cameras,
764            tiled_camera_data_types=args_cli.tiled_camera_data_types,
765            standard_camera_data_types=args_cli.standard_camera_data_types,
766            ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
767            height=args_cli.height,
768            width=args_cli.width,
769            num_objects=args_cli.num_objects,
770            mesh_prim_paths=args_cli.ray_caster_visible_mesh_prim_paths,
771        )
772        # Play simulator
773        sim.reset()
774        # Now we are ready!
775        print("[INFO]: Setup complete...")
776        # Run simulator
777        run_simulator(
778            sim=sim,
779            scene_entities=scene_entities,
780            warm_start_length=args_cli.warm_start_length,
781            experiment_length=args_cli.experiment_length,
782            tiled_camera_data_types=args_cli.tiled_camera_data_types,
783            standard_camera_data_types=args_cli.standard_camera_data_types,
784            ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
785            convert_depth_to_camera_to_image_plane=args_cli.convert_depth_to_camera_to_image_plane,
786        )
787    else:
788        print("[INFO]: Using known task environment, injecting cameras.")
789        autotune_iter = 0
790        max_sys_util_thresh = [0.0, 0.0, 0.0]
791        max_num_cams = max(args_cli.num_tiled_cameras, args_cli.num_standard_cameras, args_cli.num_ray_caster_cameras)
792        cur_num_cams = max_num_cams
793        cur_sys_util = max_sys_util_thresh
794        interval = args_cli.autotune_camera_count_interval
795
796        if args_cli.autotune:
797            max_sys_util_thresh = args_cli.autotune_max_percentage_util
798            max_num_cams = args_cli.autotune_max_camera_count
799            print("[INFO]: Auto tuning until any of the following threshold are met")
800            print(f"|CPU: {max_sys_util_thresh[0]}% | RAM {max_sys_util_thresh[1]}% | GPU: {max_sys_util_thresh[2]}% |")
801            print(f"[INFO]: Maximum number of cameras allowed: {max_num_cams}")
802        # Determine which camera is being tested...
803        tiled_camera_cfg = create_tiled_camera_cfg("tiled_camera")
804        standard_camera_cfg = create_standard_camera_cfg("standard_camera")
805        ray_caster_camera_cfg = create_ray_caster_camera_cfg("ray_caster_camera")
806        camera_name_prefix = ""
807        camera_creation_callable = None
808        num_cams = 0
809        if tiled_camera_cfg is not None:
810            camera_name_prefix = "tiled_camera"
811            camera_creation_callable = create_tiled_camera_cfg
812            num_cams = args_cli.num_tiled_cameras
813        elif standard_camera_cfg is not None:
814            camera_name_prefix = "standard_camera"
815            camera_creation_callable = create_standard_camera_cfg
816            num_cams = args_cli.num_standard_cameras
817        elif ray_caster_camera_cfg is not None:
818            camera_name_prefix = "ray_caster_camera"
819            camera_creation_callable = create_ray_caster_camera_cfg
820            num_cams = args_cli.num_ray_caster_cameras
821
822        while (
823            all(cur <= max_thresh for cur, max_thresh in zip(cur_sys_util, max_sys_util_thresh))
824            and cur_num_cams <= max_num_cams
825        ):
826            cur_num_cams = num_cams + interval * autotune_iter
827            autotune_iter += 1
828
829            env = inject_cameras_into_task(
830                task=args_cli.task,
831                num_cams=cur_num_cams,
832                camera_name_prefix=camera_name_prefix,
833                camera_creation_callable=camera_creation_callable,
834                num_cameras_per_env=args_cli.task_num_cameras_per_env,
835            )
836            env.reset()
837            print(f"Testing with {cur_num_cams} {camera_name_prefix}")
838            analysis = run_simulator(
839                sim=None,
840                scene_entities=env.unwrapped.scene,
841                warm_start_length=args_cli.warm_start_length,
842                experiment_length=args_cli.experiment_length,
843                tiled_camera_data_types=args_cli.tiled_camera_data_types,
844                standard_camera_data_types=args_cli.standard_camera_data_types,
845                ray_caster_camera_data_types=args_cli.ray_caster_camera_data_types,
846                convert_depth_to_camera_to_image_plane=args_cli.convert_depth_to_camera_to_image_plane,
847                max_cameras_per_env=args_cli.task_num_cameras_per_env,
848                env=env,
849            )
850
851            cur_sys_util = analysis["system_utilization_analytics"]
852            print("Triggering reset...")
853            env.close()
854            create_new_stage()
855        print("[INFO]: DONE! Feel free to CTRL + C Me ")
856        print(f"[INFO]: If you've made it this far, you can likely simulate {cur_num_cams} {camera_name_prefix}")
857        print("Keep in mind, this is without any training running on the GPU.")
858        print("Set lower utilization thresholds to account for training.")
859
860        if not args_cli.autotune:
861            print("[WARNING]: GPU Util Statistics only correct while autotuning, ignore above.")
862
863
864if __name__ == "__main__":
865    # run the main function
866    main()
867    # close sim app
868    simulation_app.close()

Possible Parameters#

First, run

./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py -h

to see all possible parameters you can vary with this utility.

See the command line parameters related to autotune for more information about automatically determining maximum camera count.

Compare Performance in Task Environments and Automatically Determine Task Max Camera Count#

Currently, tiled cameras are the most performant camera that can handle multiple dynamic objects.

For example, to see how your system could handle 100 tiled cameras in the cartpole environment, with 2 cameras per environment (so 50 environments total) only in RGB mode, run

./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py \
--task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
--task_num_cameras_per_env 2 \
--tiled_camera_data_types rgb

If you have pynvml installed, (./isaaclab.sh -p -m pip install pynvml), you can also find the maximum number of cameras that you could run in the specified environment up to a certain performance threshold (specified by max CPU utilization percent, max RAM utilization percent, max GPU compute percent, and max GPU memory percent). For example, to find the maximum number of cameras you can run with cartpole, you could run:

./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py \
--task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
--task_num_cameras_per_env 2 \
--tiled_camera_data_types rgb --autotune \
--autotune_max_percentage_util 100 80 50 50

Autotune may lead to the program crashing, which means that it tried to run too many cameras at once. However, the max percentage utilization parameter is meant to prevent this from happening.

The output of the benchmark doesn’t include the overhead of training the network, so consider decreasing the maximum utilization percentages to account for this overhead. The final output camera count is for all cameras, so to get the total number of environments, divide the output camera count by the number of cameras per environment.

Compare Camera Type and Performance (Without a Specified Task)#

This tool can also asses performance without a task environment. For example, to view 100 random objects with 2 standard cameras, one could run

./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py \
--height 100 --width 100 --num_standard_cameras 2 \
--standard_camera_data_types instance_segmentation_fast normals --num_objects 100 \
--experiment_length 100

If your system cannot handle this due to performance reasons, then the process will be killed. It’s recommended to monitor CPU/RAM utilization and GPU utilization while running this script, to get an idea of how many resources rendering the desired camera requires. In Ubuntu, you can use tools like htop and nvtop to live monitor resources while running this script, and in Windows, you can use the Task Manager.

If your system has a hard time handling the desired cameras, you can try the following

  • Switch to headless mode (supply --headless)

  • Ensure you are using the GPU pipeline not CPU!

  • If you aren’t using Tiled Cameras, switch to Tiled Cameras

  • Decrease camera resolution

  • Decrease how many data_types there are for each camera.

  • Decrease the number of cameras

  • Decrease the number of objects in the scene

If your system is able to handle the amount of cameras, then the time statistics will be printed to the terminal. After the simulations stops it can be closed with CTRL C.