isaaclab_rl

isaaclab_rl#

Package for environment wrappers to different learning frameworks.

Wrappers allow you to modify the behavior of an environment without modifying the environment itself. This is useful for modifying the observation space, action space, or reward function. Additionally, they can be used to cast a given environment into the respective environment class definition used by different learning frameworks. This operation may include handling of asymmetric actor-critic observations, casting the data between different backends such numpy and pytorch, or organizing the returned data into the expected data structure by the learning framework.

All wrappers work similar to the gymnasium.Wrapper class. Using a wrapper is as simple as passing the initialized environment instance to the wrapper constructor. However, since learning frameworks expect different input and output data structures, their wrapper classes are not compatible with each other. Thus, they should always be used in conjunction with the respective learning framework.

RL-Games Wrapper#

Wrapper to configure an environment instance to RL-Games vectorized environment.

The following example shows how to wrap an environment for RL-Games and register the environment construction for RL-Games Runner class:

from rl_games.common import env_configurations, vecenv

from isaaclab_rl.rl_games import RlGamesGpuEnv, RlGamesVecEnvWrapper

# configuration parameters
rl_device = "cuda:0"
clip_obs = 10.0
clip_actions = 1.0

# wrap around environment for rl-games
env = RlGamesVecEnvWrapper(env, rl_device, clip_obs, clip_actions)

# register the environment to rl-games registry
# note: in agents configuration: environment name must be "rlgpu"
vecenv.register(
    "IsaacRlgWrapper", lambda config_name, num_actors, **kwargs: RlGamesGpuEnv(config_name, num_actors, **kwargs)
)
env_configurations.register("rlgpu", {"vecenv_type": "IsaacRlgWrapper", "env_creator": lambda **kwargs: env})

Classes:

`RlGamesVecEnvWrapper`	Wraps around Isaac Lab environment for RL-Games.
`RlGamesGpuEnv`	Thin wrapper to create instance of the environment to fit RL-Games runner.

class isaaclab_rl.rl_games.RlGamesVecEnvWrapper[source]#

Bases: IVecEnv

Wraps around Isaac Lab environment for RL-Games.

This class wraps around the Isaac Lab environment. Since RL-Games works directly on GPU buffers, the wrapper handles moving of buffers from the simulation environment to the same device as the learning agent. Additionally, it performs clipping of observations and actions.

For algorithms like asymmetric actor-critic, RL-Games expects a dictionary for observations. This dictionary contains “obs” and “states” which typically correspond to the actor and critic observations respectively.

To use asymmetric actor-critic, the environment observations from ManagerBasedRLEnv or DirectRLEnv must have the key or group name “critic”. The observation group is used to set the num_states (int) and state_space (gym.spaces.Box). These are used by the learning agent in RL-Games to allocate buffers in the trajectory memory. Since this is optional for some environments, the wrapper checks if these attributes exist. If they don’t then the wrapper defaults to zero as number of privileged observations.

Caution

This class must be the last wrapper in the wrapper chain. This is because the wrapper does not follow the gym.Wrapper interface. Any subsequent wrappers will need to be modified to work with this wrapper.

Reference:: Denys88/rl_games NVIDIA-Omniverse/IsaacGymEnvs

Methods:

`__init__`(env, rl_device, clip_obs, clip_actions)	Initializes the wrapper instance.
`class_name`()	Returns the class name of the wrapper.
`get_number_of_agents`()	Returns number of actors in the environment.
`get_env_info`()	Returns the Gym spaces for the environment.

Attributes:

`render_mode`	Returns the `Env` `render_mode`.
`observation_space`	Returns the `Env` `observation_space`.
`action_space`	Returns the `Env` `action_space`.
`unwrapped`	Returns the base environment of the wrapper.
`num_envs`	Returns the number of sub-environment instances.
`device`	Returns the base environment simulation device.
`state_space`	Returns the `Env` `observation_space`.

__init__(env: ManagerBasedRLEnv | DirectRLEnv, rl_device: str, clip_obs: float, clip_actions: float)[source]#

Initializes the wrapper instance.

Parameters:

env – The environment to wrap around.
rl_device – The device on which agent computations are performed.
clip_obs – The clipping value for observations.
clip_actions – The clipping value for actions.

Raises:

ValueError – The environment is not inherited from ManagerBasedRLEnv or DirectRLEnv.
ValueError – If specified, the privileged observations (critic) are not of type gym.spaces.Box.

property render_mode: str | None#: Returns the Env render_mode.

property observation_space: gym.spaces.Box#: Returns the Env observation_space.

property action_space: gym.Space#: Returns the Env action_space.

classmethod class_name() → str[source]#: Returns the class name of the wrapper.

property unwrapped: ManagerBasedRLEnv | DirectRLEnv#

Returns the base environment of the wrapper.

This will be the bare gymnasium.Env environment, underneath all layers of wrappers.

property num_envs: int#: Returns the number of sub-environment instances.

property device: str#: Returns the base environment simulation device.

property state_space: gym.spaces.Box | None#: Returns the Env observation_space.

get_number_of_agents() → int[source]#: Returns number of actors in the environment.

get_env_info() → dict[source]#: Returns the Gym spaces for the environment.

class isaaclab_rl.rl_games.RlGamesGpuEnv[source]#

Bases: IVecEnv

Thin wrapper to create instance of the environment to fit RL-Games runner.

Methods:

`__init__`(config_name, num_actors, **kwargs)	Initialize the environment.
`get_number_of_agents`()	Get number of agents in the environment.
`get_env_info`()	Get the Gym spaces for the environment.

__init__(config_name: str, num_actors: int, **kwargs)[source]#

Initialize the environment.

Parameters:

config_name – The name of the environment configuration.
num_actors – The number of actors in the environment. This is not used in this wrapper.

get_number_of_agents() → int[source]#

Get number of agents in the environment.

Returns:: The number of agents in the environment.

get_env_info() → dict[source]#

Get the Gym spaces for the environment.

Returns:: The Gym spaces for the environment.

RSL-RL Wrapper#

Wrappers and utilities to configure an environment for RSL-RL library.

The following example shows how to wrap an environment for RSL-RL:

from isaaclab_rl.rsl_rl import RslRlVecEnvWrapper

env = RslRlVecEnvWrapper(env)

Functions:

`export_policy_as_jit`(actor_critic, ...[, ...])	Export policy into a Torch JIT file.
`export_policy_as_onnx`(actor_critic, path[, ...])	Export policy into a Torch ONNX file.

Classes:

`RslRlOnPolicyRunnerCfg`	Configuration of the runner for on-policy algorithms.
`RslRlPpoActorCriticCfg`	Configuration for the PPO actor-critic networks.
`RslRlPpoAlgorithmCfg`	Configuration for the PPO algorithm.
`RslRlRndCfg`	Configuration for the Random Network Distillation (RND) module.
`RslRlSymmetryCfg`	Configuration for the symmetry-augmentation in the training.
`RslRlVecEnvWrapper`	Wraps around Isaac Lab environment for RSL-RL library

isaaclab_rl.rsl_rl.export_policy_as_jit(actor_critic: object, normalizer: object | None, path: str, filename='policy.pt')[source]#

Export policy into a Torch JIT file.

Parameters:

actor_critic – The actor-critic torch module.
normalizer – The empirical normalizer module. If None, Identity is used.
path – The path to the saving directory.
filename – The name of exported JIT file. Defaults to “policy.pt”.

isaaclab_rl.rsl_rl.export_policy_as_onnx(actor_critic: object, path: str, normalizer: object | None = None, filename='policy.onnx', verbose=False)[source]#

Export policy into a Torch ONNX file.

Parameters:

actor_critic – The actor-critic torch module.
normalizer – The empirical normalizer module. If None, Identity is used.
path – The path to the saving directory.
filename – The name of exported ONNX file. Defaults to “policy.onnx”.
verbose – Whether to print the model summary. Defaults to False.

class isaaclab_rl.rsl_rl.RslRlOnPolicyRunnerCfg[source]#

Bases: object

Configuration of the runner for on-policy algorithms.

Methods:

__init__([seed, device, num_steps_per_env, ...])

Attributes:

`seed`	The seed for the experiment.
`device`
`num_steps_per_env`	The number of steps per environment per update.
`max_iterations`	The maximum number of iterations.
`empirical_normalization`	Whether to use empirical normalization.
`policy`	The policy configuration.
`algorithm`	The algorithm configuration.
`clip_actions`	The clipping value for actions.
`save_interval`	The number of iterations between saves.
`experiment_name`	The experiment name.
`run_name`	The run name.
`logger`	The logger to use.
`neptune_project`	The neptune project name.
`wandb_project`	The wandb project name.
`resume`	Whether to resume.
`load_run`	The run directory to load.
`load_checkpoint`	The checkpoint file to load.

__init__(seed: int = <factory>, device: str = <factory>, num_steps_per_env: int = <factory>, max_iterations: int = <factory>, empirical_normalization: bool = <factory>, policy: ~isaaclab_rl.rsl_rl.rl_cfg.RslRlPpoActorCriticCfg = <factory>, algorithm: ~isaaclab_rl.rsl_rl.rl_cfg.RslRlPpoAlgorithmCfg = <factory>, clip_actions: float | None = <factory>, save_interval: int = <factory>, experiment_name: str = <factory>, run_name: str = <factory>, logger: ~typing.Literal['tensorboard', 'neptune', 'wandb'] = <factory>, neptune_project: str = <factory>, wandb_project: str = <factory>, resume: bool = <factory>, load_run: str = <factory>, load_checkpoint: str = <factory>) → None#

seed: int#: The seed for the experiment. Default is 42.

device: str#

Type:: The device for the rl-agent. Default is cuda

num_steps_per_env: int#: The number of steps per environment per update.

max_iterations: int#: The maximum number of iterations.

empirical_normalization: bool#: Whether to use empirical normalization.

policy: RslRlPpoActorCriticCfg#: The policy configuration.

algorithm: RslRlPpoAlgorithmCfg#: The algorithm configuration.

clip_actions: float | None#: The clipping value for actions. If None, then no clipping is done.

Note

This clipping is performed inside the RslRlVecEnvWrapper wrapper.

save_interval: int#: The number of iterations between saves.

experiment_name: str#: The experiment name.

run_name: str#

The run name. Default is empty string.

The name of the run directory is typically the time-stamp at execution. If the run name is not empty, then it is appended to the run directory’s name, i.e. the logging directory’s name will become {time-stamp}_{run_name}.

logger: Literal['tensorboard', 'neptune', 'wandb']#: The logger to use. Default is tensorboard.

neptune_project: str#: The neptune project name. Default is “isaaclab”.

wandb_project: str#: The wandb project name. Default is “isaaclab”.

resume: bool#: Whether to resume. Default is False.

load_run: str#

The run directory to load. Default is “.*” (all).

If regex expression, the latest (alphabetical order) matching run will be loaded.

load_checkpoint: str#

The checkpoint file to load. Default is "model_.*.pt" (all).

If regex expression, the latest (alphabetical order) matching file will be loaded.

class isaaclab_rl.rsl_rl.RslRlPpoActorCriticCfg[source]#

Bases: object

Configuration for the PPO actor-critic networks.

Attributes:

`class_name`	The policy class name.
`init_noise_std`	The initial noise standard deviation for the policy.
`noise_std_type`	The type of noise standard deviation for the policy.
`actor_hidden_dims`	The hidden dimensions of the actor network.
`critic_hidden_dims`	The hidden dimensions of the critic network.
`activation`	The activation function for the actor and critic networks.

Methods:

__init__([class_name, init_noise_std, ...])

class_name: str#: The policy class name. Default is ActorCritic.

init_noise_std: float#: The initial noise standard deviation for the policy.

noise_std_type: Literal['scalar', 'log']#: The type of noise standard deviation for the policy. Default is scalar.

actor_hidden_dims: list[int]#: The hidden dimensions of the actor network.

critic_hidden_dims: list[int]#: The hidden dimensions of the critic network.

activation: str#: The activation function for the actor and critic networks.

__init__(class_name: str = <factory>, init_noise_std: float = <factory>, noise_std_type: ~typing.Literal['scalar', 'log'] = <factory>, actor_hidden_dims: list[int] = <factory>, critic_hidden_dims: list[int] = <factory>, activation: str = <factory>) → None#

class isaaclab_rl.rsl_rl.RslRlPpoAlgorithmCfg[source]#

Bases: object

Configuration for the PPO algorithm.

Attributes:

`class_name`	The algorithm class name.
`value_loss_coef`	The coefficient for the value loss.
`use_clipped_value_loss`	Whether to use clipped value loss.
`clip_param`	The clipping parameter for the policy.
`entropy_coef`	The coefficient for the entropy loss.
`num_learning_epochs`	The number of learning epochs per update.
`num_mini_batches`	The number of mini-batches per update.
`learning_rate`	The learning rate for the policy.
`schedule`	The learning rate schedule.
`gamma`	The discount factor.
`lam`	The lambda parameter for Generalized Advantage Estimation (GAE).
`desired_kl`	The desired KL divergence.
`max_grad_norm`	The maximum gradient norm.
`normalize_advantage_per_mini_batch`	Whether to normalize the advantage per mini-batch.
`symmetry_cfg`	The symmetry configuration.
`rnd_cfg`	The configuration for the Random Network Distillation (RND) module.

Methods:

__init__([class_name, value_loss_coef, ...])

class_name: str#: The algorithm class name. Default is PPO.

value_loss_coef: float#: The coefficient for the value loss.

use_clipped_value_loss: bool#: Whether to use clipped value loss.

clip_param: float#: The clipping parameter for the policy.

entropy_coef: float#: The coefficient for the entropy loss.

num_learning_epochs: int#: The number of learning epochs per update.

num_mini_batches: int#: The number of mini-batches per update.

learning_rate: float#: The learning rate for the policy.

schedule: str#: The learning rate schedule.

gamma: float#: The discount factor.

lam: float#: The lambda parameter for Generalized Advantage Estimation (GAE).

desired_kl: float#: The desired KL divergence.

max_grad_norm: float#: The maximum gradient norm.

normalize_advantage_per_mini_batch: bool#

Whether to normalize the advantage per mini-batch. Default is False.

If True, the advantage is normalized over the entire collected trajectories. Otherwise, the advantage is normalized over the mini-batches only.

__init__(class_name: str = <factory>, value_loss_coef: float = <factory>, use_clipped_value_loss: bool = <factory>, clip_param: float = <factory>, entropy_coef: float = <factory>, num_learning_epochs: int = <factory>, num_mini_batches: int = <factory>, learning_rate: float = <factory>, schedule: str = <factory>, gamma: float = <factory>, lam: float = <factory>, desired_kl: float = <factory>, max_grad_norm: float = <factory>, normalize_advantage_per_mini_batch: bool = <factory>, symmetry_cfg: ~isaaclab_rl.rsl_rl.symmetry_cfg.RslRlSymmetryCfg | None = <factory>, rnd_cfg: ~isaaclab_rl.rsl_rl.rnd_cfg.RslRlRndCfg | None = <factory>) → None#

symmetry_cfg: RslRlSymmetryCfg | None#: The symmetry configuration. Default is None, in which case symmetry is not used.

rnd_cfg: RslRlRndCfg | None#: The configuration for the Random Network Distillation (RND) module. Default is None, in which case RND is not used.

class isaaclab_rl.rsl_rl.RslRlRndCfg[source]#

Bases: object

Configuration for the Random Network Distillation (RND) module.

For more information, please check the work from [SKB+23].

Classes:

`WeightScheduleCfg`	Configuration for the weight schedule.
`LinearWeightScheduleCfg`	Configuration for the linear weight schedule.
`StepWeightScheduleCfg`	Configuration for the step weight schedule.

Attributes:

`weight`	The weight for the RND reward (also known as intrinsic reward).
`weight_schedule`	The weight schedule for the RND reward.
`reward_normalization`	Whether to normalize the RND reward.
`state_normalization`	Whether to normalize the RND state.
`learning_rate`	The learning rate for the RND module.
`num_outputs`	The number of outputs for the RND module.
`predictor_hidden_dims`	The hidden dimensions for the RND predictor network.
`target_hidden_dims`	The hidden dimensions for the RND target network.

Methods:

__init__([weight, weight_schedule, ...])

class WeightScheduleCfg[source]#

Bases: object

Configuration for the weight schedule.

Attributes:

mode

The type of weight schedule.

Methods:

__init__([mode])

mode: str#: The type of weight schedule. Default is “constant”.

__init__(mode: str = <factory>) → None#

class LinearWeightScheduleCfg[source]#

Bases: WeightScheduleCfg

Configuration for the linear weight schedule.

This schedule decays the weight linearly from the initial value to the final value between initial_step and before final_step.

Attributes:

`mode`	The type of weight schedule.
`final_value`	The final value of the weight parameter.
`initial_step`	The initial step of the weight schedule.
`final_step`	The final step of the weight schedule.

Methods:

__init__([mode, final_value, initial_step, ...])

mode: str#: The type of weight schedule. Default is “constant”.

final_value: float#: The final value of the weight parameter.

initial_step: int#

The initial step of the weight schedule.

For steps before this step, the weight is the initial value specified in RslRlRndCfg.weight.

final_step: int#

The final step of the weight schedule.

For steps after this step, the weight is the final value specified in final_value.

__init__(mode: str = <factory>, final_value: float = <factory>, initial_step: int = <factory>, final_step: int = <factory>) → None#

class StepWeightScheduleCfg[source]#

Bases: WeightScheduleCfg

Configuration for the step weight schedule.

This schedule sets the weight to the value specified in final_value at step final_step.

Attributes:

`mode`	The type of weight schedule.
`final_step`	The final step of the weight schedule.
`final_value`	The final value of the weight parameter.

Methods:

__init__([mode, final_step, final_value])

mode: str#: The type of weight schedule. Default is “constant”.

final_step: int#

The final step of the weight schedule.

For steps after this step, the weight is the value specified in final_value.

final_value: float#: The final value of the weight parameter.

__init__(mode: str = <factory>, final_step: int = <factory>, final_value: float = <factory>) → None#

weight: float#

The weight for the RND reward (also known as intrinsic reward). Default is 0.0.

Similar to other reward terms, the RND reward is scaled by this weight.

__init__(weight: float = <factory>, weight_schedule: ~isaaclab_rl.rsl_rl.rnd_cfg.RslRlRndCfg.WeightScheduleCfg | None = <factory>, reward_normalization: bool = <factory>, state_normalization: bool = <factory>, learning_rate: float = <factory>, num_outputs: int = <factory>, predictor_hidden_dims: list[int] = <factory>, target_hidden_dims: list[int] = <factory>) → None#

weight_schedule: WeightScheduleCfg | None#: The weight schedule for the RND reward. Default is None, which means the weight is constant.

reward_normalization: bool#: Whether to normalize the RND reward. Default is False.

state_normalization: bool#: Whether to normalize the RND state. Default is False.

learning_rate: float#: The learning rate for the RND module. Default is 1e-3.

num_outputs: int#: The number of outputs for the RND module. Default is 1.

predictor_hidden_dims: list[int]#

The hidden dimensions for the RND predictor network. Default is [-1].

If the list contains -1, then the hidden dimensions are the same as the input dimensions.

target_hidden_dims: list[int]#

The hidden dimensions for the RND target network. Default is [-1].

If the list contains -1, then the hidden dimensions are the same as the input dimensions.

class isaaclab_rl.rsl_rl.RslRlSymmetryCfg[source]#

Bases: object

Configuration for the symmetry-augmentation in the training.

When use_data_augmentation() is True, the data_augmentation_func() is used to generate augmented observations and actions. These are then used to train the model.

When use_mirror_loss() is True, the mirror_loss_coeff() is used to weight the symmetry-mirror loss. This loss is directly added to the agent’s loss function.

If both use_data_augmentation() and use_mirror_loss() are False, then no symmetry-based training is enabled. However, the data_augmentation_func() is called to compute and log symmetry metrics. This is useful for performing ablations.

For more information, please check the work from [MRK+24].

Attributes:

`use_data_augmentation`	Whether to use symmetry-based data augmentation.
`use_mirror_loss`	Whether to use the symmetry-augmentation loss.
`data_augmentation_func`	The symmetry data augmentation function.
`mirror_loss_coeff`	The weight for the symmetry-mirror loss.

Methods:

__init__([use_data_augmentation, ...])

use_data_augmentation: bool#: Whether to use symmetry-based data augmentation. Default is False.

use_mirror_loss: bool#: Whether to use the symmetry-augmentation loss. Default is False.

__init__(use_data_augmentation: bool = <factory>, use_mirror_loss: bool = <factory>, data_augmentation_func: callable = <factory>, mirror_loss_coeff: float = <factory>) → None#

data_augmentation_func: callable#

The symmetry data augmentation function.

The function signature should be as follows:

Parameters:

env (VecEnv) – The environment object. This is used to access the environment’s properties.
obs (torch.Tensor | None) – The observation tensor. If None, the observation is not used.
action (torch.Tensor | None) – The action tensor. If None, the action is not used.
obs_type (str) – The name of the observation type. Defaults to “policy”. This is useful when handling augmentation for different observation groups.

Returns:

A tuple containing the augmented observation and action tensors. The tensors can be None, if their respective inputs are None.

mirror_loss_coeff: float#: The weight for the symmetry-mirror loss. Default is 0.0.

class isaaclab_rl.rsl_rl.RslRlVecEnvWrapper[source]#

Bases: VecEnv

Wraps around Isaac Lab environment for RSL-RL library

To use asymmetric actor-critic, the environment instance must have the attributes num_privileged_obs (int). This is used by the learning agent to allocate buffers in the trajectory memory. Additionally, the returned observations should have the key “critic” which corresponds to the privileged observations. Since this is optional for some environments, the wrapper checks if these attributes exist. If they don’t then the wrapper defaults to zero as number of privileged observations.

Caution

This class must be the last wrapper in the wrapper chain. This is because the wrapper does not follow the gym.Wrapper interface. Any subsequent wrappers will need to be modified to work with this wrapper.

Reference:: leggedrobotics/rsl_rl

Methods:

`__init__`(env[, clip_actions])	Initializes the wrapper.
`class_name`()	Returns the class name of the wrapper.
`get_observations`()	Returns the current observations of the environment.

Attributes:

`cfg`	Returns the configuration class instance of the environment.
`render_mode`	Returns the `Env` `render_mode`.
`observation_space`	Returns the `Env` `observation_space`.
`action_space`	Returns the `Env` `action_space`.
`unwrapped`	Returns the base environment of the wrapper.
`episode_length_buf`	The episode length buffer.

__init__(env: ManagerBasedRLEnv | DirectRLEnv, clip_actions: float | None = None)[source]#

Initializes the wrapper.

Note

The wrapper calls reset() at the start since the RSL-RL runner does not call reset.

Parameters:

env – The environment to wrap around.
clip_actions – The clipping value for actions. If None, then no clipping is done.

Raises:

ValueError – When the environment is not an instance of ManagerBasedRLEnv or DirectRLEnv.

property cfg: object#: Returns the configuration class instance of the environment.

property render_mode: str | None#: Returns the Env render_mode.

property observation_space: Space#: Returns the Env observation_space.

property action_space: Space#: Returns the Env action_space.

classmethod class_name() → str[source]#: Returns the class name of the wrapper.

property unwrapped: ManagerBasedRLEnv | DirectRLEnv#

Returns the base environment of the wrapper.

This will be the bare gymnasium.Env environment, underneath all layers of wrappers.

get_observations() → tuple[torch.Tensor, dict][source]#: Returns the current observations of the environment.

property episode_length_buf: torch.Tensor#: The episode length buffer.

SKRL Wrapper#

Wrapper to configure an environment instance to skrl environment.

The following example shows how to wrap an environment for skrl:

from isaaclab_rl.skrl import SkrlVecEnvWrapper

env = SkrlVecEnvWrapper(env, ml_framework="torch")  # or ml_framework="jax"

Or, equivalently, by directly calling the skrl library API as follows:

from skrl.envs.torch.wrappers import wrap_env  # for PyTorch, or...
from skrl.envs.jax.wrappers import wrap_env    # for JAX

env = wrap_env(env, wrapper="isaaclab")

Functions:

SkrlVecEnvWrapper(env[, ml_framework, wrapper])

Wraps around Isaac Lab environment for skrl.

isaaclab_rl.skrl.SkrlVecEnvWrapper(env: ManagerBasedRLEnv | DirectRLEnv | DirectMARLEnv, ml_framework: Literal['torch', 'jax', 'jax-numpy'] = 'torch', wrapper: Literal['auto', 'isaaclab', 'isaaclab-single-agent', 'isaaclab-multi-agent'] = 'isaaclab')[source]#

Wraps around Isaac Lab environment for skrl.

This function wraps around the Isaac Lab environment. Since the wrapping functionality is defined within the skrl library itself, this implementation is maintained for compatibility with the structure of the extension that contains it. Internally it calls the wrap_env() from the skrl library API.

Parameters:

env – The environment to wrap around.
ml_framework – The ML framework to use for the wrapper. Defaults to “torch”.
wrapper – The wrapper to use. Defaults to “isaaclab”: leave it to skrl to determine if the environment will be wrapped as single-agent or multi-agent.

Raises:

ValueError – When the environment is not an instance of any Isaac Lab environment interface.
ValueError – If the specified ML framework is not valid.

Reference:: https://skrl.readthedocs.io/en/latest/api/envs/wrapping.html

Stable-Baselines3 Wrapper#

Wrapper to configure an environment instance to Stable-Baselines3 vectorized environment.

The following example shows how to wrap an environment for Stable-Baselines3:

from isaaclab_rl.sb3 import Sb3VecEnvWrapper

env = Sb3VecEnvWrapper(env)

Functions:

process_sb3_cfg(cfg)

Convert simple YAML types to Stable-Baselines classes/components.

Classes:

Sb3VecEnvWrapper

Wraps around Isaac Lab environment for Stable Baselines3.

isaaclab_rl.sb3.process_sb3_cfg(cfg: dict) → dict[source]#

Convert simple YAML types to Stable-Baselines classes/components.

Parameters:: cfg – A configuration dictionary.
Returns:: A dictionary containing the converted configuration.

Reference:: DLR-RM/rl-baselines3-zoo

class isaaclab_rl.sb3.Sb3VecEnvWrapper[source]#

Bases: VecEnv

Wraps around Isaac Lab environment for Stable Baselines3.

Isaac Sim internally implements a vectorized environment. However, since it is still considered a single environment instance, Stable Baselines tries to wrap around it using the DummyVecEnv. This is only done if the environment is not inheriting from their VecEnv. Thus, this class thinly wraps over the environment from ManagerBasedRLEnv or DirectRLEnv.

Note

While Stable-Baselines3 supports Gym 0.26+ API, their vectorized environment still uses the old API (i.e. it is closer to Gym 0.21). Thus, we implement the old API for the vectorized environment.

We also add monitoring functionality that computes the un-discounted episode return and length. This information is added to the info dicts under key episode.

In contrast to the Isaac Lab environment, stable-baselines expect the following:

numpy datatype for MDP signals
a list of info dicts for each sub-environment (instead of a dict)
when environment has terminated, the observations from the environment should correspond to the one after reset. The “real” final observation is passed using the info dicts under the key terminal_observation.

Warning

By the nature of physics stepping in Isaac Sim, it is not possible to forward the simulation buffers without performing a physics step. Thus, reset is performed inside the step() function after the actual physics step is taken. Thus, the returned observations for terminated environments is the one after the reset.

Caution

This class must be the last wrapper in the wrapper chain. This is because the wrapper does not follow the gym.Wrapper interface. Any subsequent wrappers will need to be modified to work with this wrapper.

Reference:

Methods:

`__init__`(env)	Initialize the wrapper.
`class_name`()	Returns the class name of the wrapper.
`get_episode_rewards`()	Returns the rewards of all the episodes.
`get_episode_lengths`()	Returns the number of time-steps of all the episodes.

Attributes:

unwrapped

Returns the base environment of the wrapper.

__init__(env: ManagerBasedRLEnv | DirectRLEnv)[source]#

Initialize the wrapper.

Parameters:: env – The environment to wrap around.
Raises:: ValueError – When the environment is not an instance of ManagerBasedRLEnv or DirectRLEnv.

classmethod class_name() → str[source]#: Returns the class name of the wrapper.

property unwrapped: ManagerBasedRLEnv | DirectRLEnv#

Returns the base environment of the wrapper.

This will be the bare gymnasium.Env environment, underneath all layers of wrappers.

get_episode_rewards() → list[float][source]#: Returns the rewards of all the episodes.

get_episode_lengths() → list[int][source]#: Returns the number of time-steps of all the episodes.

isaaclab_rl

Contents

isaaclab_rl#

RL-Games Wrapper#

RSL-RL Wrapper#

SKRL Wrapper#

Stable-Baselines3 Wrapper#