omni.isaac.lab.envs#
Sub-package for environment definitions.
Environments define the interface between the agent and the simulation. In the simplest case, the environment provides the agent with the current observations and executes the actions provided by the agent. However, the environment can also provide additional information such as the current reward, done flag, and information about the current episode.
There are two types of environment designing workflows:
Manager-based: The environment is decomposed into individual components (or managers) for different aspects (such as computing observations, applying actions, and applying randomization. The users mainly configure the managers and the environment coordinates the managers and calls their functions.
Direct: The user implements all the necessary functionality directly into a single class directly without the need for additional managers.
Based on these workflows, there are the following environment classes for single and multi-agent RL:
Single-Agent RL:
ManagerBasedEnv
: The manager-based workflow base environment which only provides the agent with the current observations and executes the actions provided by the agent.ManagerBasedRLEnv
: The manager-based workflow RL task environment which besides the functionality of the base environment also provides additional Markov Decision Process (MDP) related information such as the current reward, done flag, and information.DirectRLEnv
: The direct workflow RL task environment which provides implementations for implementing scene setup, computing dones, performing resets, and computing reward and observation.
Multi-Agent RL (MARL):
DirectMARLEnv
: The direct workflow MARL task environment which provides implementations for implementing scene setup, computing dones, performing resets, and computing reward and observation.
For more information about the workflow design patterns, see the Task Design Workflows section.
Submodules
Sub-module with implementation of manager terms. |
|
Sub-module providing UI window implementation for environments. |
Classes
The base environment encapsulates the simulation scene and the environment managers for the manager-based workflow. |
|
Base configuration of the environment. |
|
The superclass for the manager-based workflow reinforcement learning-based environments. |
|
Configuration for a reinforcement learning environment with the manager-based workflow. |
|
The superclass for the direct workflow to design environments. |
|
Configuration for an RL environment defined with the direct workflow. |
|
The superclass for the direct workflow to design multi-agent environments. |
|
Configuration for a MARL environment defined with the direct workflow. |
|
Configuration of the scene viewport camera. |
Manager Based Environment#
- class omni.isaac.lab.envs.ManagerBasedEnv[source]#
The base environment encapsulates the simulation scene and the environment managers for the manager-based workflow.
While a simulation scene or world comprises of different components such as the robots, objects, and sensors (cameras, lidars, etc.), the environment is a higher level abstraction that provides an interface for interacting with the simulation. The environment is comprised of the following components:
Scene: The scene manager that creates and manages the virtual world in which the robot operates. This includes defining the robot, static and dynamic objects, sensors, etc.
Observation Manager: The observation manager that generates observations from the current simulation state and the data gathered from the sensors. These observations may include privileged information that is not available to the robot in the real world. Additionally, user-defined terms can be added to process the observations and generate custom observations. For example, using a network to embed high-dimensional observations into a lower-dimensional space.
Action Manager: The action manager that processes the raw actions sent to the environment and converts them to low-level commands that are sent to the simulation. It can be configured to accept raw actions at different levels of abstraction. For example, in case of a robotic arm, the raw actions can be joint torques, joint positions, or end-effector poses. Similarly for a mobile base, it can be the joint torques, or the desired velocity of the floating base.
Event Manager: The event manager orchestrates operations triggered based on simulation events. This includes resetting the scene to a default state, applying random pushes to the robot at different intervals of time, or randomizing properties such as mass and friction coefficients. This is useful for training and evaluating the robot in a variety of scenarios.
Recorder Manager: The recorder manager that handles recording data produced during different steps in the simulation. This includes recording in the beginning and end of a reset and a step. The recorded data is distinguished per episode, per environment and can be exported through a dataset file handler to a file.
The environment provides a unified interface for interacting with the simulation. However, it does not include task-specific quantities such as the reward function, or the termination conditions. These quantities are often specific to defining Markov Decision Processes (MDPs) while the base environment is agnostic to the MDP definition.
The environment steps forward in time at a fixed time-step. The physics simulation is decimated at a lower time-step. This is to ensure that the simulation is stable. These two time-steps can be configured independently using the
ManagerBasedEnvCfg.decimation
(number of simulation steps per environment step) and theManagerBasedEnvCfg.sim.dt
(physics time-step) parameters. Based on these parameters, the environment time-step is computed as the product of the two. The two time-steps can be obtained by querying thephysics_dt
and thestep_dt
properties respectively.Methods:
__init__
(cfg)Initialize the environment.
Load the managers for the environment.
Creates live visualizers for manager terms.
reset
([seed, env_ids, options])Resets the specified environments and returns observations.
reset_to
(state, env_ids[, seed, is_relative])Resets specified environments to known states.
step
(action)Execute one time-step of the environment's dynamics.
seed
([seed])Set the seed for the environment.
close
()Cleanup for the environment.
Attributes:
The number of instances of the environment that are running.
The physics time-step (in s).
The environment stepping time-step (in s).
The device on which the environment is running.
- __init__(cfg: ManagerBasedEnvCfg)[source]#
Initialize the environment.
- Parameters:
cfg – The configuration object for the environment.
- Raises:
RuntimeError – If a simulation context already exists. The environment must always create one since it configures the simulation context and controls the simulation.
- property physics_dt: float#
The physics time-step (in s).
This is the lowest time-decimation at which the simulation is happening.
- property step_dt: float#
The environment stepping time-step (in s).
This is the time-step at which the environment steps forward.
- property device#
The device on which the environment is running.
- load_managers()[source]#
Load the managers for the environment.
This function is responsible for creating the various managers (action, observation, events, etc.) for the environment. Since the managers require access to physics handles, they can only be created after the simulator is reset (i.e. played for the first time).
Note
In case of standalone application (when running simulator from Python), the function is called automatically when the class is initialized.
However, in case of extension mode, the user must call this function manually after the simulator is reset. This is because the simulator is only reset when the user calls
SimulationContext.reset_async()
and it isn’t possible to call async functions in the constructor.
- reset(seed: int | None = None, env_ids: Sequence[int] | None = None, options: dict[str, Any] | None = None) tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], dict] [source]#
Resets the specified environments and returns observations.
This function calls the
_reset_idx()
function to reset the specified environments. However, certain operations, such as procedural terrain generation, that happened during initialization are not repeated.- Parameters:
seed – The seed to use for randomization. Defaults to None, in which case the seed is not set.
env_ids – The environment ids to reset. Defaults to None, in which case all environments are reset.
options –
Additional information to specify how the environment is reset. Defaults to None.
Note
This argument is used for compatibility with Gymnasium environment definition.
- Returns:
A tuple containing the observations and extras.
- reset_to(state: dict[str, dict[str, dict[str, torch.Tensor]]], env_ids: Sequence[int] | None, seed: int | None = None, is_relative: bool = False) None [source]#
Resets specified environments to known states.
Note that this is different from reset() function as it resets the environments to specific states
- Parameters:
state – The state to reset the specified environments to.
env_ids – The environment ids to reset. Defaults to None, in which case all environments are reset.
seed – The seed to use for randomization. Defaults to None, in which case the seed is not set.
is_relative – If set to True, the state is considered relative to the environment origins. Defaults to False.
- step(action: torch.Tensor) tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], dict] [source]#
Execute one time-step of the environment’s dynamics.
The environment steps forward at a fixed time-step, while the physics simulation is decimated at a lower time-step. This is to ensure that the simulation is stable. These two time-steps can be configured independently using the
ManagerBasedEnvCfg.decimation
(number of simulation steps per environment step) and theManagerBasedEnvCfg.sim.dt
(physics time-step). Based on these parameters, the environment time-step is computed as the product of the two.- Parameters:
action – The actions to apply on the environment. Shape is (num_envs, action_dim).
- Returns:
A tuple containing the observations and extras.
- class omni.isaac.lab.envs.ManagerBasedEnvCfg[source]#
Base configuration of the environment.
Attributes:
Viewer configuration.
Physics simulation configuration.
The seed for the random number generator.
Number of control action updates @ sim dt per policy dt.
Scene settings.
Recorder settings.
Observation space settings.
Action space settings.
Event settings.
Whether a render step is performed again after at least one environment has been reset.
Classes:
alias of
BaseEnvWindow
- sim: SimulationCfg#
Physics simulation configuration. Default is SimulationCfg().
- ui_window_class_type#
alias of
BaseEnvWindow
- seed: int | None#
The seed for the random number generator. Defaults to None, in which case the seed is not set.
Note
The seed is set at the beginning of the environment initialization. This ensures that the environment creation is deterministic and behaves similarly across different runs.
- decimation: int#
Number of control action updates @ sim dt per policy dt.
For instance, if the simulation dt is 0.01s and the policy dt is 0.1s, then the decimation is 10. This means that the control action is updated every 10 simulation steps.
- scene: InteractiveSceneCfg#
Scene settings.
Please refer to the
omni.isaac.lab.scene.InteractiveSceneCfg
class for more details.
- recorders: object#
Recorder settings. Defaults to recording nothing.
Please refer to the
omni.isaac.lab.managers.RecorderManager
class for more details.
- observations: object#
Observation space settings.
Please refer to the
omni.isaac.lab.managers.ObservationManager
class for more details.
- actions: object#
Action space settings.
Please refer to the
omni.isaac.lab.managers.ActionManager
class for more details.
- events: object#
Event settings. Defaults to the basic configuration that resets the scene to its default state.
Please refer to the
omni.isaac.lab.managers.EventManager
class for more details.
- rerender_on_reset: bool#
Whether a render step is performed again after at least one environment has been reset. Defaults to False, which means no render step will be performed after reset.
When this is False, data collected from sensors after performing reset will be stale and will not reflect the latest states in simulation caused by the reset.
When this is True, an extra render step will be performed to update the sensor data to reflect the latest states from the reset. This comes at a cost of performance as an additional render step will be performed after each time an environment is reset.
Manager Based RL Environment#
- class omni.isaac.lab.envs.ManagerBasedRLEnv[source]#
Bases:
ManagerBasedEnv
,Env
The superclass for the manager-based workflow reinforcement learning-based environments.
This class inherits from
ManagerBasedEnv
and implements the core functionality for reinforcement learning-based environments. It is designed to be used with any RL library. The class is designed to be used with vectorized environments, i.e., the environment is expected to be run in parallel with multiple sub-environments. The number of sub-environments is specified using thenum_envs
.Each observation from the environment is a batch of observations for each sub- environments. The method
step()
is also expected to receive a batch of actions for each sub-environment.While the environment itself is implemented as a vectorized environment, we do not inherit from
gym.vector.VectorEnv
. This is mainly because the class adds various methods (for wait and asynchronous updates) which are not required. Additionally, each RL library typically has its own definition for a vectorized environment. Thus, to reduce complexity, we directly use thegym.Env
over here and leave it up to library-defined wrappers to take care of wrapping this environment for their agents.Note
For vectorized environments, it is recommended to only call the
reset()
method once before the first call tostep()
, i.e. after the environment is created. After that, thestep()
function handles the reset of terminated sub-environments. This is because the simulator does not support resetting individual sub-environments in a vectorized environment.Attributes:
Whether the environment is a vectorized environment.
Metadata for the environment.
Configuration for the environment.
Maximum episode length in seconds.
Maximum episode length in environment steps.
The device on which the environment is running.
Returns the environment's internal
_np_random
that if not set will initialise with a random seed.Returns the environment's internal
_np_random_seed
that if not set will first initialise with a random int as seed.The number of instances of the environment that are running.
The physics time-step (in s).
The environment stepping time-step (in s).
Returns the base non-wrapped environment.
Methods:
__init__
(cfg[, render_mode])Initialize the environment.
Load the managers for the environment.
Creates live visualizers for manager terms.
step
(action)Execute one time-step of the environment's dynamics and reset terminated environments.
render
([recompute])Run rendering without stepping through the physics.
close
()Cleanup for the environment.
get_wrapper_attr
(name)Gets the attribute name from the environment.
has_wrapper_attr
(name)Checks if the attribute name exists in the environment.
reset
([seed, env_ids, options])Resets the specified environments and returns observations.
reset_to
(state, env_ids[, seed, is_relative])Resets specified environments to known states.
seed
([seed])Set the seed for the environment.
set_wrapper_attr
(name, value)Sets the attribute name on the environment with value.
- metadata: ClassVar[dict[str, Any]] = {'isaac_sim_version': omni.isaac.version.get_version, 'render_modes': [None, 'human', 'rgb_array']}#
Metadata for the environment.
- cfg: ManagerBasedRLEnvCfg#
Configuration for the environment.
- __init__(cfg: ManagerBasedRLEnvCfg, render_mode: str | None = None, **kwargs)[source]#
Initialize the environment.
- Parameters:
cfg – The configuration for the environment.
render_mode – The render mode for the environment. Defaults to None, which is similar to
"human"
.
- load_managers()[source]#
Load the managers for the environment.
This function is responsible for creating the various managers (action, observation, events, etc.) for the environment. Since the managers require access to physics handles, they can only be created after the simulator is reset (i.e. played for the first time).
Note
In case of standalone application (when running simulator from Python), the function is called automatically when the class is initialized.
However, in case of extension mode, the user must call this function manually after the simulator is reset. This is because the simulator is only reset when the user calls
SimulationContext.reset_async()
and it isn’t possible to call async functions in the constructor.
- step(action: torch.Tensor) tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], torch.Tensor, torch.Tensor, torch.Tensor, dict] [source]#
Execute one time-step of the environment’s dynamics and reset terminated environments.
Unlike the
ManagerBasedEnv.step
class, the function performs the following operations:Process the actions.
Perform physics stepping.
Perform rendering if gui is enabled.
Update the environment counters and compute the rewards and terminations.
Reset the environments that terminated.
Compute the observations.
Return the observations, rewards, resets and extras.
- Parameters:
action – The actions to apply on the environment. Shape is (num_envs, action_dim).
- Returns:
A tuple containing the observations, rewards, resets (terminated and truncated) and extras.
- render(recompute: bool = False) np.ndarray | None [source]#
Run rendering without stepping through the physics.
By convention, if mode is:
human: Render to the current display and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
- Parameters:
recompute – Whether to force a render even if the simulator has already rendered the scene. Defaults to False.
- Returns:
The rendered image as a numpy array if mode is “rgb_array”. Otherwise, returns None.
- Raises:
RuntimeError – If mode is set to “rgb_data” and simulation render mode does not support it. In this case, the simulation render mode must be set to
RenderMode.PARTIAL_RENDERING
orRenderMode.FULL_RENDERING
.NotImplementedError – If an unsupported rendering mode is specified.
- property device#
The device on which the environment is running.
- property np_random: numpy.random.Generator#
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- property np_random_seed: int#
Returns the environment’s internal
_np_random_seed
that if not set will first initialise with a random int as seed.If
np_random_seed
was set directly instead of throughreset()
orset_np_random_through_seed()
, the seed will take the value -1.- Returns:
the seed of the current np_random or -1, if the seed of the rng is unknown
- Return type:
- property physics_dt: float#
The physics time-step (in s).
This is the lowest time-decimation at which the simulation is happening.
- reset(seed: int | None = None, env_ids: Sequence[int] | None = None, options: dict[str, Any] | None = None) tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], dict] #
Resets the specified environments and returns observations.
This function calls the
_reset_idx()
function to reset the specified environments. However, certain operations, such as procedural terrain generation, that happened during initialization are not repeated.- Parameters:
seed – The seed to use for randomization. Defaults to None, in which case the seed is not set.
env_ids – The environment ids to reset. Defaults to None, in which case all environments are reset.
options –
Additional information to specify how the environment is reset. Defaults to None.
Note
This argument is used for compatibility with Gymnasium environment definition.
- Returns:
A tuple containing the observations and extras.
- reset_to(state: dict[str, dict[str, dict[str, torch.Tensor]]], env_ids: Sequence[int] | None, seed: int | None = None, is_relative: bool = False) None #
Resets specified environments to known states.
Note that this is different from reset() function as it resets the environments to specific states
- Parameters:
state – The state to reset the specified environments to.
env_ids – The environment ids to reset. Defaults to None, in which case all environments are reset.
seed – The seed to use for randomization. Defaults to None, in which case the seed is not set.
is_relative – If set to True, the state is considered relative to the environment origins. Defaults to False.
- static seed(seed: int = -1) int #
Set the seed for the environment.
- Parameters:
seed – The seed for random generator. Defaults to -1.
- Returns:
The seed used for random generator.
- property step_dt: float#
The environment stepping time-step (in s).
This is the time-step at which the environment steps forward.
- property unwrapped: Env[ObsType, ActType]#
Returns the base non-wrapped environment.
- Returns:
The base non-wrapped
gymnasium.Env
instance- Return type:
Env
- class omni.isaac.lab.envs.ManagerBasedRLEnvCfg[source]#
Bases:
ManagerBasedEnvCfg
Configuration for a reinforcement learning environment with the manager-based workflow.
Classes:
Attributes:
Whether the learning task is treated as a finite or infinite horizon problem for the agent.
Duration of an episode (in seconds).
Reward settings.
Viewer configuration.
Physics simulation configuration.
The seed for the random number generator.
Number of control action updates @ sim dt per policy dt.
Scene settings.
Recorder settings.
Observation space settings.
Action space settings.
Event settings.
Whether a render step is performed again after at least one environment has been reset.
Termination settings.
Curriculum settings.
Command settings.
- ui_window_class_type#
alias of
ManagerBasedRLEnvWindow
- is_finite_horizon: bool#
Whether the learning task is treated as a finite or infinite horizon problem for the agent. Defaults to False, which means the task is treated as an infinite horizon problem.
This flag handles the subtleties of finite and infinite horizon tasks:
Finite horizon: no penalty or bootstrapping value is required by the the agent for running out of time. However, the environment still needs to terminate the episode after the time limit is reached.
Infinite horizon: the agent needs to bootstrap the value of the state at the end of the episode. This is done by sending a time-limit (or truncated) done signal to the agent, which triggers this bootstrapping calculation.
If True, then the environment is treated as a finite horizon problem and no time-out (or truncated) done signal is sent to the agent. If False, then the environment is treated as an infinite horizon problem and a time-out (or truncated) done signal is sent to the agent.
Note
The base
ManagerBasedRLEnv
class does not use this flag directly. It is used by the environment wrappers to determine what type of done signal to send to the corresponding learning agent.
- episode_length_s: float#
Duration of an episode (in seconds).
Based on the decimation rate and physics time step, the episode length is calculated as:
episode_length_steps = ceil(episode_length_s / (decimation_rate * physics_time_step))
For example, if the decimation rate is 10, the physics time step is 0.01, and the episode length is 10 seconds, then the episode length in steps is 100.
- rewards: object#
Reward settings.
Please refer to the
omni.isaac.lab.managers.RewardManager
class for more details.
- sim: SimulationCfg#
Physics simulation configuration. Default is SimulationCfg().
- seed: int | None#
The seed for the random number generator. Defaults to None, in which case the seed is not set.
Note
The seed is set at the beginning of the environment initialization. This ensures that the environment creation is deterministic and behaves similarly across different runs.
- decimation: int#
Number of control action updates @ sim dt per policy dt.
For instance, if the simulation dt is 0.01s and the policy dt is 0.1s, then the decimation is 10. This means that the control action is updated every 10 simulation steps.
- scene: InteractiveSceneCfg#
Scene settings.
Please refer to the
omni.isaac.lab.scene.InteractiveSceneCfg
class for more details.
- recorders: object#
Recorder settings. Defaults to recording nothing.
Please refer to the
omni.isaac.lab.managers.RecorderManager
class for more details.
- observations: object#
Observation space settings.
Please refer to the
omni.isaac.lab.managers.ObservationManager
class for more details.
- actions: object#
Action space settings.
Please refer to the
omni.isaac.lab.managers.ActionManager
class for more details.
- events: object#
Event settings. Defaults to the basic configuration that resets the scene to its default state.
Please refer to the
omni.isaac.lab.managers.EventManager
class for more details.
- rerender_on_reset: bool#
Whether a render step is performed again after at least one environment has been reset. Defaults to False, which means no render step will be performed after reset.
When this is False, data collected from sensors after performing reset will be stale and will not reflect the latest states in simulation caused by the reset.
When this is True, an extra render step will be performed to update the sensor data to reflect the latest states from the reset. This comes at a cost of performance as an additional render step will be performed after each time an environment is reset.
- terminations: object#
Termination settings.
Please refer to the
omni.isaac.lab.managers.TerminationManager
class for more details.
- curriculum: object | None#
Curriculum settings. Defaults to None, in which case no curriculum is applied.
Please refer to the
omni.isaac.lab.managers.CurriculumManager
class for more details.
- commands: object | None#
Command settings. Defaults to None, in which case no commands are generated.
Please refer to the
omni.isaac.lab.managers.CommandManager
class for more details.
Direct RL Environment#
- class omni.isaac.lab.envs.DirectRLEnv[source]#
Bases:
Env
The superclass for the direct workflow to design environments.
This class implements the core functionality for reinforcement learning (RL) environments. It is designed to be used with any RL library. The class is designed to be used with vectorized environments, i.e., the environment is expected to be run in parallel with multiple sub-environments.
While the environment itself is implemented as a vectorized environment, we do not inherit from
gym.vector.VectorEnv
. This is mainly because the class adds various methods (for wait and asynchronous updates) which are not required. Additionally, each RL library typically has its own definition for a vectorized environment. Thus, to reduce complexity, we directly use thegym.Env
over here and leave it up to library-defined wrappers to take care of wrapping this environment for their agents.Note
For vectorized environments, it is recommended to only call the
reset()
method once before the first call tostep()
, i.e. after the environment is created. After that, thestep()
function handles the reset of terminated sub-environments. This is because the simulator does not support resetting individual sub-environments in a vectorized environment.Attributes:
Whether the environment is a vectorized environment.
Metadata for the environment.
The number of instances of the environment that are running.
The physics time-step (in s).
The environment stepping time-step (in s).
The device on which the environment is running.
Maximum episode length in seconds.
The maximum episode length in steps adjusted from s.
Returns the environment's internal
_np_random
that if not set will initialise with a random seed.Returns the environment's internal
_np_random_seed
that if not set will first initialise with a random int as seed.Returns the base non-wrapped environment.
Methods:
__init__
(cfg[, render_mode])Initialize the environment.
reset
([seed, options])Resets all the environments and returns observations.
step
(action)Execute one time-step of the environment's dynamics.
seed
([seed])Set the seed for the environment.
render
([recompute])Run rendering without stepping through the physics.
close
()Cleanup for the environment.
set_debug_vis
(debug_vis)Toggles the environment debug visualization.
get_wrapper_attr
(name)Gets the attribute name from the environment.
has_wrapper_attr
(name)Checks if the attribute name exists in the environment.
set_wrapper_attr
(name, value)Sets the attribute name on the environment with value.
- metadata: ClassVar[dict[str, Any]] = {'isaac_sim_version': omni.isaac.version.get_version, 'render_modes': [None, 'human', 'rgb_array']}#
Metadata for the environment.
- __init__(cfg: DirectRLEnvCfg, render_mode: str | None = None, **kwargs)[source]#
Initialize the environment.
- Parameters:
cfg – The configuration object for the environment.
render_mode – The render mode for the environment. Defaults to None, which is similar to
"human"
.
- Raises:
RuntimeError – If a simulation context already exists. The environment must always create one since it configures the simulation context and controls the simulation.
- property physics_dt: float#
The physics time-step (in s).
This is the lowest time-decimation at which the simulation is happening.
- property step_dt: float#
The environment stepping time-step (in s).
This is the time-step at which the environment steps forward.
- property device#
The device on which the environment is running.
- property max_episode_length#
The maximum episode length in steps adjusted from s.
- reset(seed: int | None = None, options: dict[str, Any] | None = None) tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], dict] [source]#
Resets all the environments and returns observations.
This function calls the
_reset_idx()
function to reset all the environments. However, certain operations, such as procedural terrain generation, that happened during initialization are not repeated.- Parameters:
seed – The seed to use for randomization. Defaults to None, in which case the seed is not set.
options –
Additional information to specify how the environment is reset. Defaults to None.
Note
This argument is used for compatibility with Gymnasium environment definition.
- Returns:
A tuple containing the observations and extras.
- step(action: torch.Tensor) tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], torch.Tensor, torch.Tensor, torch.Tensor, dict] [source]#
Execute one time-step of the environment’s dynamics.
The environment steps forward at a fixed time-step, while the physics simulation is decimated at a lower time-step. This is to ensure that the simulation is stable. These two time-steps can be configured independently using the
DirectRLEnvCfg.decimation
(number of simulation steps per environment step) and theDirectRLEnvCfg.sim.physics_dt
(physics time-step). Based on these parameters, the environment time-step is computed as the product of the two.This function performs the following steps:
Pre-process the actions before stepping through the physics.
Apply the actions to the simulator and step through the physics in a decimated manner.
Compute the reward and done signals.
Reset environments that have terminated or reached the maximum episode length.
Apply interval events if they are enabled.
Compute observations.
- Parameters:
action – The actions to apply on the environment. Shape is (num_envs, action_dim).
- Returns:
A tuple containing the observations, rewards, resets (terminated and truncated) and extras.
- static seed(seed: int = -1) int [source]#
Set the seed for the environment.
- Parameters:
seed – The seed for random generator. Defaults to -1.
- Returns:
The seed used for random generator.
- render(recompute: bool = False) np.ndarray | None [source]#
Run rendering without stepping through the physics.
By convention, if mode is:
human: Render to the current display and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
- Parameters:
recompute – Whether to force a render even if the simulator has already rendered the scene. Defaults to False.
- Returns:
The rendered image as a numpy array if mode is “rgb_array”. Otherwise, returns None.
- Raises:
RuntimeError – If mode is set to “rgb_data” and simulation render mode does not support it. In this case, the simulation render mode must be set to
RenderMode.PARTIAL_RENDERING
orRenderMode.FULL_RENDERING
.NotImplementedError – If an unsupported rendering mode is specified.
- set_debug_vis(debug_vis: bool) bool [source]#
Toggles the environment debug visualization.
- Parameters:
debug_vis – Whether to visualize the environment debug visualization.
- Returns:
Whether the debug visualization was successfully set. False if the environment does not support debug visualization.
- property np_random: numpy.random.Generator#
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- property np_random_seed: int#
Returns the environment’s internal
_np_random_seed
that if not set will first initialise with a random int as seed.If
np_random_seed
was set directly instead of throughreset()
orset_np_random_through_seed()
, the seed will take the value -1.- Returns:
the seed of the current np_random or -1, if the seed of the rng is unknown
- Return type:
- property unwrapped: Env[ObsType, ActType]#
Returns the base non-wrapped environment.
- Returns:
The base non-wrapped
gymnasium.Env
instance- Return type:
Env
- class omni.isaac.lab.envs.DirectRLEnvCfg[source]#
Bases:
object
Configuration for an RL environment defined with the direct workflow.
Please refer to the
omni.isaac.lab.envs.direct_rl_env.DirectRLEnv
class for more details.Attributes:
Viewer configuration.
Physics simulation configuration.
The seed for the random number generator.
Number of control action updates @ sim dt per policy dt.
Whether the learning task is treated as a finite or infinite horizon problem for the agent.
Duration of an episode (in seconds).
Scene settings.
Event settings.
Observation space definition.
The dimension of the observation space from each environment instance.
State space definition.
The dimension of the state-space from each environment instance.
The noise model to apply to the computed observations from the environment.
Action space definition.
The dimension of the action space for each environment.
The noise model applied to the actions provided to the environment.
Whether a render step is performed again after at least one environment has been reset.
Classes:
alias of
BaseEnvWindow
- sim: SimulationCfg#
Physics simulation configuration. Default is SimulationCfg().
- ui_window_class_type#
alias of
BaseEnvWindow
- seed: int | None#
The seed for the random number generator. Defaults to None, in which case the seed is not set.
Note
The seed is set at the beginning of the environment initialization. This ensures that the environment creation is deterministic and behaves similarly across different runs.
- decimation: int#
Number of control action updates @ sim dt per policy dt.
For instance, if the simulation dt is 0.01s and the policy dt is 0.1s, then the decimation is 10. This means that the control action is updated every 10 simulation steps.
- is_finite_horizon: bool#
Whether the learning task is treated as a finite or infinite horizon problem for the agent. Defaults to False, which means the task is treated as an infinite horizon problem.
This flag handles the subtleties of finite and infinite horizon tasks:
Finite horizon: no penalty or bootstrapping value is required by the the agent for running out of time. However, the environment still needs to terminate the episode after the time limit is reached.
Infinite horizon: the agent needs to bootstrap the value of the state at the end of the episode. This is done by sending a time-limit (or truncated) done signal to the agent, which triggers this bootstrapping calculation.
If True, then the environment is treated as a finite horizon problem and no time-out (or truncated) done signal is sent to the agent. If False, then the environment is treated as an infinite horizon problem and a time-out (or truncated) done signal is sent to the agent.
Note
The base
ManagerBasedRLEnv
class does not use this flag directly. It is used by the environment wrappers to determine what type of done signal to send to the corresponding learning agent.
- episode_length_s: float#
Duration of an episode (in seconds).
Based on the decimation rate and physics time step, the episode length is calculated as:
episode_length_steps = ceil(episode_length_s / (decimation_rate * physics_time_step))
For example, if the decimation rate is 10, the physics time step is 0.01, and the episode length is 10 seconds, then the episode length in steps is 100.
- scene: InteractiveSceneCfg#
Scene settings.
Please refer to the
omni.isaac.lab.scene.InteractiveSceneCfg
class for more details.
- events: object | None#
Event settings. Defaults to None, in which case no events are applied through the event manager.
Please refer to the
omni.isaac.lab.managers.EventManager
class for more details.
- observation_space: SpaceType#
Observation space definition.
The space can be defined either using Gymnasium
spaces
(when a more detailed specification of the space is desired) or basic Python data types (for simplicity).Gymnasium space
Python data type
Integer or list of integers (e.g.:
7
,[64, 64, 3]
)Single-element set (e.g.:
{2}
)List of single-element sets (e.g.:
[{2}, {5}]
)Dictionary (e.g.:
{"joints": 7, "rgb": [64, 64, 3], "gripper": {2}}
)Tuple (e.g.:
(7, [64, 64, 3], {2})
)
- num_observations: int | None#
The dimension of the observation space from each environment instance.
Warning
This attribute is deprecated. Use
observation_space
instead.
- state_space: SpaceType | None#
State space definition.
This is useful for asymmetric actor-critic and defines the observation space for the critic.
The space can be defined either using Gymnasium
spaces
(when a more detailed specification of the space is desired) or basic Python data types (for simplicity).Gymnasium space
Python data type
Integer or list of integers (e.g.:
7
,[64, 64, 3]
)Single-element set (e.g.:
{2}
)List of single-element sets (e.g.:
[{2}, {5}]
)Dictionary (e.g.:
{"joints": 7, "rgb": [64, 64, 3], "gripper": {2}}
)Tuple (e.g.:
(7, [64, 64, 3], {2})
)
- num_states: int | None#
The dimension of the state-space from each environment instance.
Warning
This attribute is deprecated. Use
state_space
instead.
- observation_noise_model: NoiseModelCfg | None#
The noise model to apply to the computed observations from the environment. Default is None, which means no noise is added.
Please refer to the
omni.isaac.lab.utils.noise.NoiseModel
class for more details.
- action_space: SpaceType#
Action space definition.
The space can be defined either using Gymnasium
spaces
(when a more detailed specification of the space is desired) or basic Python data types (for simplicity).Gymnasium space
Python data type
Integer or list of integers (e.g.:
7
,[64, 64, 3]
)Single-element set (e.g.:
{2}
)List of single-element sets (e.g.:
[{2}, {5}]
)Dictionary (e.g.:
{"joints": 7, "rgb": [64, 64, 3], "gripper": {2}}
)Tuple (e.g.:
(7, [64, 64, 3], {2})
)
- num_actions: int | None#
The dimension of the action space for each environment.
Warning
This attribute is deprecated. Use
action_space
instead.
- action_noise_model: NoiseModelCfg | None#
The noise model applied to the actions provided to the environment. Default is None, which means no noise is added.
Please refer to the
omni.isaac.lab.utils.noise.NoiseModel
class for more details.
- rerender_on_reset: bool#
Whether a render step is performed again after at least one environment has been reset. Defaults to False, which means no render step will be performed after reset.
When this is False, data collected from sensors after performing reset will be stale and will not reflect the latest states in simulation caused by the reset.
When this is True, an extra render step will be performed to update the sensor data to reflect the latest states from the reset. This comes at a cost of performance as an additional render step will be performed after each time an environment is reset.
Direct Multi-Agent RL Environment#
- class omni.isaac.lab.envs.DirectMARLEnv[source]#
Bases:
Env
The superclass for the direct workflow to design multi-agent environments.
This class implements the core functionality for multi-agent reinforcement learning (MARL) environments. It is designed to be used with any RL library. The class is designed to be used with vectorized environments, i.e., the environment is expected to be run in parallel with multiple sub-environments.
The design of this class is based on the PettingZoo Parallel API. While the environment itself is implemented as a vectorized environment, we do not inherit from
pettingzoo.ParallelEnv
orgym.vector.VectorEnv
. This is mainly because the class adds various attributes and methods that are inconsistent with them.Note
For vectorized environments, it is recommended to only call the
reset()
method once before the first call tostep()
, i.e. after the environment is created. After that, thestep()
function handles the reset of terminated sub-environments. This is because the simulator does not support resetting individual sub-environments in a vectorized environment.Attributes:
Metadata for the environment.
The number of instances of the environment that are running.
Number of current agents.
Number of all possible agents the environment can generate.
Get the unwrapped environment underneath all the layers of wrappers.
The physics time-step (in s).
The environment stepping time-step (in s).
The device on which the environment is running.
Maximum episode length in seconds.
The maximum episode length in steps adjusted from s.
Returns the environment's internal
_np_random
that if not set will initialise with a random seed.Returns the environment's internal
_np_random_seed
that if not set will first initialise with a random int as seed.Methods:
__init__
(cfg[, render_mode])Initialize the environment.
observation_space
(agent)Get the observation space for the specified agent.
action_space
(agent)Get the action space for the specified agent.
reset
([seed, options])Resets all the environments and returns observations.
step
(actions)Execute one time-step of the environment's dynamics.
state
()Returns the state for the environment.
seed
([seed])Set the seed for the environment.
render
([recompute])Run rendering without stepping through the physics.
close
()Cleanup for the environment.
set_debug_vis
(debug_vis)Toggles the environment debug visualization.
get_wrapper_attr
(name)Gets the attribute name from the environment.
has_wrapper_attr
(name)Checks if the attribute name exists in the environment.
set_wrapper_attr
(name, value)Sets the attribute name on the environment with value.
- metadata: ClassVar[dict[str, Any]] = {'isaac_sim_version': omni.isaac.version.get_version, 'render_modes': [None, 'human', 'rgb_array']}#
Metadata for the environment.
- __init__(cfg: DirectMARLEnvCfg, render_mode: str | None = None, **kwargs)[source]#
Initialize the environment.
- Parameters:
cfg – The configuration object for the environment.
render_mode – The render mode for the environment. Defaults to None, which is similar to
"human"
.
- Raises:
RuntimeError – If a simulation context already exists. The environment must always create one since it configures the simulation context and controls the simulation.
- property num_agents: int#
Number of current agents.
The number of current agents may change as the environment progresses (e.g.: agents can be added or removed).
- property max_num_agents: int#
Number of all possible agents the environment can generate.
This value remains constant as the environment progresses.
- property unwrapped: DirectMARLEnv#
Get the unwrapped environment underneath all the layers of wrappers.
- property physics_dt: float#
The physics time-step (in s).
This is the lowest time-decimation at which the simulation is happening.
- property step_dt: float#
The environment stepping time-step (in s).
This is the time-step at which the environment steps forward.
- property device#
The device on which the environment is running.
- property max_episode_length#
The maximum episode length in steps adjusted from s.
- observation_space(agent: AgentID) Space [source]#
Get the observation space for the specified agent.
- Returns:
The agent’s observation space.
- action_space(agent: AgentID) Space [source]#
Get the action space for the specified agent.
- Returns:
The agent’s action space.
- reset(seed: int | None = None, options: dict[str, Any] | None = None) tuple[dict[AgentID, ObsType], dict[AgentID, dict]] [source]#
Resets all the environments and returns observations.
- Parameters:
seed – The seed to use for randomization. Defaults to None, in which case the seed is not set.
options –
Additional information to specify how the environment is reset. Defaults to None.
Note
This argument is used for compatibility with Gymnasium environment definition.
- Returns:
A tuple containing the observations and extras (keyed by the agent ID).
- step(actions: dict[AgentID, ActionType]) tuple[Dict[AgentID, ObsType], Dict[AgentID, torch.Tensor], Dict[AgentID, torch.Tensor], Dict[AgentID, torch.Tensor], Dict[AgentID, dict]] [source]#
Execute one time-step of the environment’s dynamics.
The environment steps forward at a fixed time-step, while the physics simulation is decimated at a lower time-step. This is to ensure that the simulation is stable. These two time-steps can be configured independently using the
DirectMARLEnvCfg.decimation
(number of simulation steps per environment step) and theDirectMARLEnvCfg.sim.physics_dt
(physics time-step). Based on these parameters, the environment time-step is computed as the product of the two.This function performs the following steps:
Pre-process the actions before stepping through the physics.
Apply the actions to the simulator and step through the physics in a decimated manner.
Compute the reward and done signals.
Reset environments that have terminated or reached the maximum episode length.
Apply interval events if they are enabled.
Compute observations.
- Parameters:
actions – The actions to apply on the environment (keyed by the agent ID). Shape of individual tensors is (num_envs, action_dim).
- Returns:
A tuple containing the observations, rewards, resets (terminated and truncated) and extras (keyed by the agent ID).
- state() StateType | None [source]#
Returns the state for the environment.
The state-space is used for centralized training or asymmetric actor-critic architectures. It is configured using the
DirectMARLEnvCfg.state_space
parameter.- Returns:
The states for the environment, or None if
DirectMARLEnvCfg.state_space
parameter is zero.
- static seed(seed: int = -1) int [source]#
Set the seed for the environment.
- Parameters:
seed – The seed for random generator. Defaults to -1.
- Returns:
The seed used for random generator.
- render(recompute: bool = False) np.ndarray | None [source]#
Run rendering without stepping through the physics.
By convention, if mode is:
human: Render to the current display and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
- Parameters:
recompute – Whether to force a render even if the simulator has already rendered the scene. Defaults to False.
- Returns:
The rendered image as a numpy array if mode is “rgb_array”. Otherwise, returns None.
- Raises:
RuntimeError – If mode is set to “rgb_data” and simulation render mode does not support it. In this case, the simulation render mode must be set to
RenderMode.PARTIAL_RENDERING
orRenderMode.FULL_RENDERING
.NotImplementedError – If an unsupported rendering mode is specified.
- set_debug_vis(debug_vis: bool) bool [source]#
Toggles the environment debug visualization.
- Parameters:
debug_vis – Whether to visualize the environment debug visualization.
- Returns:
Whether the debug visualization was successfully set. False if the environment does not support debug visualization.
- property np_random: numpy.random.Generator#
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- property np_random_seed: int#
Returns the environment’s internal
_np_random_seed
that if not set will first initialise with a random int as seed.If
np_random_seed
was set directly instead of throughreset()
orset_np_random_through_seed()
, the seed will take the value -1.- Returns:
the seed of the current np_random or -1, if the seed of the rng is unknown
- Return type:
- class omni.isaac.lab.envs.DirectMARLEnvCfg[source]#
Bases:
object
Configuration for a MARL environment defined with the direct workflow.
Please refer to the
omni.isaac.lab.envs.direct_marl_env.DirectMARLEnv
class for more details.Attributes:
Viewer configuration.
Physics simulation configuration.
The seed for the random number generator.
Number of control action updates @ sim dt per policy dt.
Whether the learning task is treated as a finite or infinite horizon problem for the agent.
Duration of an episode (in seconds).
Scene settings.
Event settings.
Observation space definition for each agent.
The dimension of the observation space for each agent.
State space definition.
The dimension of the state space from each environment instance.
The noise model to apply to the computed observations from the environment.
Action space definition for each agent.
The dimension of the action space for each agent.
The noise model applied to the actions provided to the environment.
A list of all possible agents the environment could generate.
Classes:
alias of
BaseEnvWindow
- sim: SimulationCfg#
Physics simulation configuration. Default is SimulationCfg().
- ui_window_class_type#
alias of
BaseEnvWindow
- seed: int | None#
The seed for the random number generator. Defaults to None, in which case the seed is not set.
Note
The seed is set at the beginning of the environment initialization. This ensures that the environment creation is deterministic and behaves similarly across different runs.
- decimation: int#
Number of control action updates @ sim dt per policy dt.
For instance, if the simulation dt is 0.01s and the policy dt is 0.1s, then the decimation is 10. This means that the control action is updated every 10 simulation steps.
- is_finite_horizon: bool#
Whether the learning task is treated as a finite or infinite horizon problem for the agent. Defaults to False, which means the task is treated as an infinite horizon problem.
This flag handles the subtleties of finite and infinite horizon tasks:
Finite horizon: no penalty or bootstrapping value is required by the the agent for running out of time. However, the environment still needs to terminate the episode after the time limit is reached.
Infinite horizon: the agent needs to bootstrap the value of the state at the end of the episode. This is done by sending a time-limit (or truncated) done signal to the agent, which triggers this bootstrapping calculation.
If True, then the environment is treated as a finite horizon problem and no time-out (or truncated) done signal is sent to the agent. If False, then the environment is treated as an infinite horizon problem and a time-out (or truncated) done signal is sent to the agent.
Note
The base
ManagerBasedRLEnv
class does not use this flag directly. It is used by the environment wrappers to determine what type of done signal to send to the corresponding learning agent.
- episode_length_s: float#
Duration of an episode (in seconds).
Based on the decimation rate and physics time step, the episode length is calculated as:
episode_length_steps = ceil(episode_length_s / (decimation_rate * physics_time_step))
For example, if the decimation rate is 10, the physics time step is 0.01, and the episode length is 10 seconds, then the episode length in steps is 100.
- scene: InteractiveSceneCfg#
Scene settings.
Please refer to the
omni.isaac.lab.scene.InteractiveSceneCfg
class for more details.
- events: object#
Event settings. Defaults to None, in which case no events are applied through the event manager.
Please refer to the
omni.isaac.lab.managers.EventManager
class for more details.
- observation_spaces: dict[AgentID, SpaceType]#
Observation space definition for each agent.
The space can be defined either using Gymnasium
spaces
(when a more detailed specification of the space is desired) or basic Python data types (for simplicity).Gymnasium space
Python data type
Integer or list of integers (e.g.:
7
,[64, 64, 3]
)Single-element set (e.g.:
{2}
)List of single-element sets (e.g.:
[{2}, {5}]
)Dictionary (e.g.:
{"joints": 7, "rgb": [64, 64, 3], "gripper": {2}}
)Tuple (e.g.:
(7, [64, 64, 3], {2})
)
- num_observations: dict[AgentID, int] | None#
The dimension of the observation space for each agent.
Warning
This attribute is deprecated. Use
observation_spaces
instead.
- state_space: SpaceType#
State space definition.
The following values are supported:
-1: All the observations from the different agents are automatically concatenated.
0: No state-space will be constructed (state_space is None). This is useful to save computational resources when the algorithm to be trained does not need it.
greater than 0: Custom state-space dimension to be provided by the task implementation.
The space can be defined either using Gymnasium
spaces
(when a more detailed specification of the space is desired) or basic Python data types (for simplicity).Gymnasium space
Python data type
Integer or list of integers (e.g.:
7
,[64, 64, 3]
)Single-element set (e.g.:
{2}
)List of single-element sets (e.g.:
[{2}, {5}]
)Dictionary (e.g.:
{"joints": 7, "rgb": [64, 64, 3], "gripper": {2}}
)Tuple (e.g.:
(7, [64, 64, 3], {2})
)
- num_states: int | None#
The dimension of the state space from each environment instance.
Warning
This attribute is deprecated. Use
state_space
instead.
- observation_noise_model: dict[AgentID, omni.isaac.lab.utils.noise.noise_cfg.NoiseModelCfg | None] | None#
The noise model to apply to the computed observations from the environment. Default is None, which means no noise is added.
Please refer to the
omni.isaac.lab.utils.noise.NoiseModel
class for more details.
- action_spaces: dict[AgentID, SpaceType]#
Action space definition for each agent.
The space can be defined either using Gymnasium
spaces
(when a more detailed specification of the space is desired) or basic Python data types (for simplicity).Gymnasium space
Python data type
Integer or list of integers (e.g.:
7
,[64, 64, 3]
)Single-element set (e.g.:
{2}
)List of single-element sets (e.g.:
[{2}, {5}]
)Dictionary (e.g.:
{"joints": 7, "rgb": [64, 64, 3], "gripper": {2}}
)Tuple (e.g.:
(7, [64, 64, 3], {2})
)
- num_actions: dict[AgentID, int] | None#
The dimension of the action space for each agent.
Warning
This attribute is deprecated. Use
action_spaces
instead.
- action_noise_model: dict[AgentID, omni.isaac.lab.utils.noise.noise_cfg.NoiseModelCfg | None] | None#
The noise model applied to the actions provided to the environment. Default is None, which means no noise is added.
Please refer to the
omni.isaac.lab.utils.noise.NoiseModel
class for more details.
Common#
- class omni.isaac.lab.envs.ViewerCfg[source]#
Configuration of the scene viewport camera.
Attributes:
Initial camera position (in m).
Initial camera target position (in m).
The camera prim path to record images from.
The resolution (width, height) of the camera specified using
cam_prim_path
.The frame in which the camera position (eye) and target (lookat) are defined in.
The environment index for frame origin.
The asset name in the interactive scene for the frame origin.
The name of the body in
asset_name
in the interactive scene for the frame origin.- lookat: tuple[float, float, float]#
Initial camera target position (in m). Default is (0.0, 0.0, 0.0).
- cam_prim_path: str#
The camera prim path to record images from. Default is “/OmniverseKit_Persp”, which is the default camera in the viewport.
- resolution: tuple[int, int]#
The resolution (width, height) of the camera specified using
cam_prim_path
. Default is (1280, 720).
- origin_type: Literal['world', 'env', 'asset_root', 'asset_body']#
The frame in which the camera position (eye) and target (lookat) are defined in. Default is “world”.
Available options are:
"world"
: The origin of the world."env"
: The origin of the environment defined byenv_index
."asset_root"
: The center of the asset defined byasset_name
in environmentenv_index
."asset_body"
: The center of the body defined bybody_name
in asset defined byasset_name
in environmentenv_index
.
- env_index: int#
The environment index for frame origin. Default is 0.
This quantity is only effective if
origin
is set to “env” or “asset_root”.
- asset_name: str | None#
The asset name in the interactive scene for the frame origin. Default is None.
This quantity is only effective if
origin
is set to “asset_root”.
- body_name: str | None#
The name of the body in
asset_name
in the interactive scene for the frame origin. Default is None.This quantity is only effective if
origin
is set to “asset_body”.