isaaclab_rl

Contents

isaaclab_rl#

Package for environment wrappers to different learning frameworks.

Wrappers allow you to modify the behavior of an environment without modifying the environment itself. This is useful for modifying the observation space, action space, or reward function. Additionally, they can be used to cast a given environment into the respective environment class definition used by different learning frameworks. This operation may include handling of asymmetric actor-critic observations, casting the data between different backends such numpy and pytorch, or organizing the returned data into the expected data structure by the learning framework.

All wrappers work similar to the gymnasium.Wrapper class. Using a wrapper is as simple as passing the initialized environment instance to the wrapper constructor. However, since learning frameworks expect different input and output data structures, their wrapper classes are not compatible with each other. Thus, they should always be used in conjunction with the respective learning framework.

RL-Games Wrapper#

Wrappers and utilities to configure an environment for rl-games library.

RSL-RL Wrapper#

Wrappers and utilities to configure an environment for RSL-RL library.

The following example shows how to wrap an environment for RSL-RL:

from isaaclab_rl.rsl_rl import RslRlVecEnvWrapper

env = RslRlVecEnvWrapper(env)

Functions:

export_policy_as_jit(policy, normalizer, path)

Export policy into a Torch JIT file.

export_policy_as_onnx(policy, path[, ...])

Export policy into a Torch ONNX file.

configclass(cls, **kwargs)

Wrapper around dataclass functionality to add extra checks and utilities.

handle_deprecated_rsl_rl_cfg(agent_cfg, ...)

Handle deprecated RSL-RL configurations across version boundaries.

Classes:

RslRlRndCfg

Configuration for the Random Network Distillation (RND) module.

RslRlSymmetryCfg

Configuration for the symmetry-augmentation in the training.

RslRlVecEnvWrapper

Wraps around Isaac Lab environment for the RSL-RL library

RslRlBaseRunnerCfg

Base configuration of the runner.

RslRlCNNModelCfg

Configuration for CNN model.

RslRlDistillationAlgorithmCfg

Configuration for the distillation algorithm.

RslRlDistillationRunnerCfg

Configuration of the runner for distillation algorithms.

RslRlDistillationStudentTeacherCfg

Configuration for the distillation student-teacher networks.

RslRlDistillationStudentTeacherRecurrentCfg

Configuration for the distillation student-teacher recurrent networks.

RslRlMLPModelCfg

Configuration for the MLP model.

RslRlOnPolicyRunnerCfg

Configuration of the runner for on-policy algorithms.

RslRlPpoActorCriticCfg

Configuration for the PPO actor-critic networks.

RslRlPpoActorCriticRecurrentCfg

Configuration for the PPO actor-critic networks with recurrent layers.

RslRlPpoAlgorithmCfg

Configuration for the PPO algorithm.

RslRlRNNModelCfg

Configuration for RNN model.

isaaclab_rl.rsl_rl.export_policy_as_jit(policy: object, normalizer: object | None, path: str, filename='policy.pt')[source]#

Export policy into a Torch JIT file.

Parameters:
  • policy – The policy torch module.

  • normalizer – The empirical normalizer module. If None, Identity is used.

  • path – The path to the saving directory.

  • filename – The name of exported JIT file. Defaults to “policy.pt”.

isaaclab_rl.rsl_rl.export_policy_as_onnx(policy: object, path: str, normalizer: object | None = None, filename='policy.onnx', verbose=False)[source]#

Export policy into a Torch ONNX file.

Parameters:
  • policy – The policy torch module.

  • normalizer – The empirical normalizer module. If None, Identity is used.

  • path – The path to the saving directory.

  • filename – The name of exported ONNX file. Defaults to “policy.onnx”.

  • verbose – Whether to print the model summary. Defaults to False.

class isaaclab_rl.rsl_rl.RslRlRndCfg[source]#

Bases: object

Configuration for the Random Network Distillation (RND) module.

For more information, please check the work from [SKB+23].

Classes:

WeightScheduleCfg

Configuration for the weight schedule.

LinearWeightScheduleCfg

Configuration for the linear weight schedule.

StepWeightScheduleCfg

Configuration for the step weight schedule.

Attributes:

weight

The weight for the RND reward (also known as intrinsic reward).

weight_schedule

The weight schedule for the RND reward.

reward_normalization

Whether to normalize the RND reward.

state_normalization

Whether to normalize the RND state.

learning_rate

The learning rate for the RND module.

num_outputs

The number of outputs for the RND module.

predictor_hidden_dims

The hidden dimensions for the RND predictor network.

target_hidden_dims

The hidden dimensions for the RND target network.

Methods:

__init__([weight, weight_schedule, ...])

class WeightScheduleCfg[source]#

Bases: object

Configuration for the weight schedule.

Attributes:

mode

The type of weight schedule.

Methods:

__init__([mode])

mode: str#

The type of weight schedule. Default is “constant”.

__init__(mode: str = <factory>) None#
class LinearWeightScheduleCfg[source]#

Bases: WeightScheduleCfg

Configuration for the linear weight schedule.

This schedule decays the weight linearly from the initial value to the final value between initial_step and before final_step.

Attributes:

mode

The type of weight schedule.

final_value

The final value of the weight parameter.

initial_step

The initial step of the weight schedule.

final_step

The final step of the weight schedule.

Methods:

__init__([mode, final_value, initial_step, ...])

mode: str#

The type of weight schedule. Default is “constant”.

final_value: float#

The final value of the weight parameter.

initial_step: int#

The initial step of the weight schedule.

For steps before this step, the weight is the initial value specified in RslRlRndCfg.weight.

final_step: int#

The final step of the weight schedule.

For steps after this step, the weight is the final value specified in final_value.

__init__(mode: str = <factory>, final_value: float = <factory>, initial_step: int = <factory>, final_step: int = <factory>) None#
class StepWeightScheduleCfg[source]#

Bases: WeightScheduleCfg

Configuration for the step weight schedule.

This schedule sets the weight to the value specified in final_value at step final_step.

Attributes:

mode

The type of weight schedule.

final_step

The final step of the weight schedule.

final_value

The final value of the weight parameter.

Methods:

__init__([mode, final_step, final_value])

mode: str#

The type of weight schedule. Default is “constant”.

final_step: int#

The final step of the weight schedule.

For steps after this step, the weight is the value specified in final_value.

final_value: float#

The final value of the weight parameter.

__init__(mode: str = <factory>, final_step: int = <factory>, final_value: float = <factory>) None#
weight: float#

The weight for the RND reward (also known as intrinsic reward). Default is 0.0.

Similar to other reward terms, the RND reward is scaled by this weight.

__init__(weight: float = <factory>, weight_schedule: ~isaaclab_rl.rsl_rl.rnd_cfg.RslRlRndCfg.WeightScheduleCfg | None = <factory>, reward_normalization: bool = <factory>, state_normalization: bool = <factory>, learning_rate: float = <factory>, num_outputs: int = <factory>, predictor_hidden_dims: list[int] = <factory>, target_hidden_dims: list[int] = <factory>) None#
weight_schedule: WeightScheduleCfg | None#

The weight schedule for the RND reward. Default is None, which means the weight is constant.

reward_normalization: bool#

Whether to normalize the RND reward. Default is False.

state_normalization: bool#

Whether to normalize the RND state. Default is False.

learning_rate: float#

The learning rate for the RND module. Default is 1e-3.

num_outputs: int#

The number of outputs for the RND module. Default is 1.

predictor_hidden_dims: list[int]#

The hidden dimensions for the RND predictor network. Default is [-1].

If the list contains -1, then the hidden dimensions are the same as the input dimensions.

target_hidden_dims: list[int]#

The hidden dimensions for the RND target network. Default is [-1].

If the list contains -1, then the hidden dimensions are the same as the input dimensions.

class isaaclab_rl.rsl_rl.RslRlSymmetryCfg[source]#

Bases: object

Configuration for the symmetry-augmentation in the training.

When use_data_augmentation() is True, the data_augmentation_func() is used to generate augmented observations and actions. These are then used to train the model.

When use_mirror_loss() is True, the mirror_loss_coeff() is used to weight the symmetry-mirror loss. This loss is directly added to the agent’s loss function.

If both use_data_augmentation() and use_mirror_loss() are False, then no symmetry-based training is enabled. However, the data_augmentation_func() is called to compute and log symmetry metrics. This is useful for performing ablations.

For more information, please check the work from [MRK+24].

Attributes:

use_data_augmentation

Whether to use symmetry-based data augmentation.

use_mirror_loss

Whether to use the symmetry-augmentation loss.

data_augmentation_func

The symmetry data augmentation function.

mirror_loss_coeff

The weight for the symmetry-mirror loss.

Methods:

__init__([use_data_augmentation, ...])

use_data_augmentation: bool#

Whether to use symmetry-based data augmentation. Default is False.

use_mirror_loss: bool#

Whether to use the symmetry-augmentation loss. Default is False.

__init__(use_data_augmentation: bool = <factory>, use_mirror_loss: bool = <factory>, data_augmentation_func: callable = <factory>, mirror_loss_coeff: float = <factory>) None#
data_augmentation_func: callable#

The symmetry data augmentation function.

The function signature should be as follows:

Parameters:
  • env (VecEnv) – The environment object. This is used to access the environment’s properties.

  • obs (tensordict.TensorDict | None) – The observation tensor dictionary. If None, the observation is not used.

  • action (torch.Tensor | None) – The action tensor. If None, the action is not used.

Returns:

A tuple containing the augmented observation dictionary and action tensors. The tensors can be None, if their respective inputs are None.

mirror_loss_coeff: float#

The weight for the symmetry-mirror loss. Default is 0.0.

class isaaclab_rl.rsl_rl.RslRlVecEnvWrapper[source]#

Bases: VecEnv

Wraps around Isaac Lab environment for the RSL-RL library

Caution

This class must be the last wrapper in the wrapper chain. This is because the wrapper does not follow the gym.Wrapper interface. Any subsequent wrappers will need to be modified to work with this wrapper.

Reference:

leggedrobotics/rsl_rl

Methods:

__init__(env[, clip_actions])

Initializes the wrapper.

class_name()

Returns the class name of the wrapper.

get_observations()

Returns the current observations of the environment.

Attributes:

cfg

Returns the configuration class instance of the environment.

render_mode

Returns the Env render_mode.

observation_space

Returns the Env observation_space.

action_space

Returns the Env action_space.

unwrapped

Returns the base environment of the wrapper.

episode_length_buf

The episode length buffer.

__init__(env: ManagerBasedRLEnv | DirectRLEnv, clip_actions: float | None = None)[source]#

Initializes the wrapper.

Note

The wrapper calls reset() at the start since the RSL-RL runner does not call reset.

Parameters:
  • env – The environment to wrap around.

  • clip_actions – The clipping value for actions. If None, then no clipping is done.

Raises:

ValueError – When the environment is not an instance of ManagerBasedRLEnv or DirectRLEnv.

property cfg: object#

Returns the configuration class instance of the environment.

property render_mode: str | None#

Returns the Env render_mode.

property observation_space: Space#

Returns the Env observation_space.

property action_space: Space#

Returns the Env action_space.

classmethod class_name() str[source]#

Returns the class name of the wrapper.

property unwrapped: ManagerBasedRLEnv | DirectRLEnv#

Returns the base environment of the wrapper.

This will be the bare gymnasium.Env environment, underneath all layers of wrappers.

property episode_length_buf: torch.Tensor#

The episode length buffer.

get_observations() tensordict.TensorDict[source]#

Returns the current observations of the environment.

class isaaclab_rl.rsl_rl.RslRlBaseRunnerCfg[source]#

Bases: object

Base configuration of the runner.

Methods:

__init__([seed, device, num_steps_per_env, ...])

Attributes:

seed

The seed for the experiment.

device

num_steps_per_env

The number of steps per environment per update.

max_iterations

The maximum number of iterations.

empirical_normalization

This parameter is deprecated and will be removed in the future.

obs_groups

A mapping from observation groups to observation sets.

clip_actions

The clipping value for actions.

check_for_nan

Whether to check for NaN values coming from the environment.

save_interval

The number of iterations between saves.

experiment_name

The experiment name.

run_name

The run name.

logger

The logger to use.

neptune_project

The neptune project name.

wandb_project

The wandb project name.

resume

Whether to resume a previous training.

load_run

The run directory to load.

load_checkpoint

The checkpoint file to load.

__init__(seed: int = <factory>, device: str = <factory>, num_steps_per_env: int = <factory>, max_iterations: int = <factory>, empirical_normalization: bool = <factory>, obs_groups: dict[str, list[str]] = <factory>, clip_actions: float | None = <factory>, check_for_nan: bool = <factory>, save_interval: int = <factory>, experiment_name: str = <factory>, run_name: str = <factory>, logger: ~typing.Literal['tensorboard', 'neptune', 'wandb'] = <factory>, neptune_project: str = <factory>, wandb_project: str = <factory>, resume: bool = <factory>, load_run: str = <factory>, load_checkpoint: str = <factory>) None#
seed: int#

The seed for the experiment. Default is 42.

device: str#
Type:

The device for the rl-agent. Default is cuda

num_steps_per_env: int#

The number of steps per environment per update.

max_iterations: int#

The maximum number of iterations.

empirical_normalization: bool#

This parameter is deprecated and will be removed in the future.

For rsl-rl < 4.0.0, use actor_obs_normalization and critic_obs_normalization of the policy instead. For rsl-rl >= 4.0.0, use obs_normalization of the model instead.

obs_groups: dict[str, list[str]]#

A mapping from observation groups to observation sets.

The keys of the dictionary are predefined observation sets used by the underlying algorithm and values are lists of observation groups provided by the environment.

For instance, if the environment provides a dictionary of observations with groups “policy”, “images”, and “privileged”, these can be mapped to algorithmic observation sets as follows:

obs_groups = {
    "actor": ["policy", "images"],
    "critic": ["policy", "privileged"],
}

This way, the actor will receive the “policy” and “images” observations, and the critic will receive the “policy” and “privileged” observations.

For more details, please check vec_env.py in the rsl_rl library.

clip_actions: float | None#

The clipping value for actions. If None, then no clipping is done. Defaults to None.

Note

This clipping is performed inside the RslRlVecEnvWrapper wrapper.

check_for_nan: bool#

Whether to check for NaN values coming from the environment.

save_interval: int#

The number of iterations between saves.

experiment_name: str#

The experiment name.

run_name: str#

The run name. Default is empty string.

The name of the run directory is typically the time-stamp at execution. If the run name is not empty, then it is appended to the run directory’s name, i.e. the logging directory’s name will become {time-stamp}_{run_name}.

logger: Literal['tensorboard', 'neptune', 'wandb']#

The logger to use. Default is tensorboard.

neptune_project: str#

The neptune project name. Default is “isaaclab”.

wandb_project: str#

The wandb project name. Default is “isaaclab”.

resume: bool#

Whether to resume a previous training. Default is False.

This flag will be ignored for distillation.

load_run: str#

The run directory to load. Default is “.*” (all).

If regex expression, the latest (alphabetical order) matching run will be loaded.

load_checkpoint: str#

The checkpoint file to load. Default is "model_.*.pt" (all).

If regex expression, the latest (alphabetical order) matching file will be loaded.

class isaaclab_rl.rsl_rl.RslRlCNNModelCfg[source]#

Bases: RslRlMLPModelCfg

Configuration for CNN model.

Attributes:

class_name

The model class name.

cnn_cfg

The configuration for the CNN(s).

hidden_dims

The hidden dimensions of the MLP network.

activation

The activation function for the MLP network.

obs_normalization

Whether to normalize the observation for the model.

distribution_cfg

The configuration for the output distribution.

stochastic

Whether the model output is stochastic.

init_noise_std

The initial noise standard deviation for the model.

noise_std_type

The type of noise standard deviation for the model.

state_dependent_std

Whether to use state-dependent standard deviation for the policy.

Classes:

CNNCfg

CNNCfg(output_channels: 'tuple[int] | list[int]' = <factory>, kernel_size: 'int | tuple[int] | list[int]' = <factory>, stride: 'int | tuple[int] | list[int]' = <factory>, dilation: 'int | tuple[int] | list[int]' = <factory>, padding: "Literal['none', 'zeros', 'reflect', 'replicate', 'circular']" = <factory>, norm: "Literal['none', 'batch', 'layer'] | tuple[str] | list[str]" = <factory>, activation: 'str' = <factory>, max_pool: 'bool | tuple[bool] | list[bool]' = <factory>, global_pool: "Literal['none', 'max', 'avg']" = <factory>, flatten: 'bool' = <factory>)

Methods:

__init__([class_name, hidden_dims, ...])

class_name: str#

The model class name. Default is CNNModel.

class CNNCfg[source]#

Bases: object

CNNCfg(output_channels: ‘tuple[int] | list[int]’ = <factory>, kernel_size: ‘int | tuple[int] | list[int]’ = <factory>, stride: ‘int | tuple[int] | list[int]’ = <factory>, dilation: ‘int | tuple[int] | list[int]’ = <factory>, padding: “Literal[‘none’, ‘zeros’, ‘reflect’, ‘replicate’, ‘circular’]” = <factory>, norm: “Literal[‘none’, ‘batch’, ‘layer’] | tuple[str] | list[str]” = <factory>, activation: ‘str’ = <factory>, max_pool: ‘bool | tuple[bool] | list[bool]’ = <factory>, global_pool: “Literal[‘none’, ‘max’, ‘avg’]” = <factory>, flatten: ‘bool’ = <factory>)

Attributes:

output_channels

The number of output channels for each convolutional layer for the CNN.

kernel_size

The kernel size for the CNN.

stride

The stride for the CNN.

dilation

The dilation for the CNN.

padding

The padding for the CNN.

norm

The normalization for the CNN.

activation

The activation function for the CNN.

max_pool

Whether to use max pooling for the CNN.

global_pool

The global pooling for the CNN.

flatten

Whether to flatten the output of the CNN.

Methods:

__init__([output_channels, kernel_size, ...])

output_channels: tuple[int] | list[int]#

The number of output channels for each convolutional layer for the CNN.

kernel_size: int | tuple[int] | list[int]#

The kernel size for the CNN.

stride: int | tuple[int] | list[int]#

The stride for the CNN.

dilation: int | tuple[int] | list[int]#

The dilation for the CNN.

padding: Literal['none', 'zeros', 'reflect', 'replicate', 'circular']#

The padding for the CNN.

norm: Literal['none', 'batch', 'layer'] | tuple[str] | list[str]#

The normalization for the CNN.

activation: str#

The activation function for the CNN.

max_pool: bool | tuple[bool] | list[bool]#

Whether to use max pooling for the CNN.

global_pool: Literal['none', 'max', 'avg']#

The global pooling for the CNN.

flatten: bool#

Whether to flatten the output of the CNN.

__init__(output_channels: tuple[int] | list[int] = <factory>, kernel_size: int | tuple[int] | list[int] = <factory>, stride: int | tuple[int] | list[int] = <factory>, dilation: int | tuple[int] | list[int] = <factory>, padding: ~typing.Literal['none', 'zeros', 'reflect', 'replicate', 'circular'] = <factory>, norm: ~typing.Literal['none', 'batch', 'layer'] | tuple[str] | list[str] = <factory>, activation: str = <factory>, max_pool: bool | tuple[bool] | list[bool] = <factory>, global_pool: ~typing.Literal['none', 'max', 'avg'] = <factory>, flatten: bool = <factory>) None#
cnn_cfg: CNNCfg#

The configuration for the CNN(s).

__init__(class_name: str = <factory>, hidden_dims: list[int] = <factory>, activation: str = <factory>, obs_normalization: bool = <factory>, distribution_cfg: DistributionCfg | None = <factory>, stochastic: bool = <factory>, init_noise_std: float = <factory>, noise_std_type: Literal['scalar', 'log'] = <factory>, state_dependent_std: bool = <factory>, cnn_cfg: CNNCfg = <factory>) None#
hidden_dims: list[int]#

The hidden dimensions of the MLP network.

activation: str#

The activation function for the MLP network.

obs_normalization: bool#

Whether to normalize the observation for the model. Default is False.

distribution_cfg: DistributionCfg | None#

The configuration for the output distribution. Default is None, in which case no distribution is used.

stochastic: bool#

Whether the model output is stochastic.

For rsl-rl >= 5.0.0, this configuration is is deprecated. Please use distribution_cfg instead and set it to None for deterministic output or to a valid configuration class, e.g., GaussianDistributionCfg for stochastic output.

init_noise_std: float#

The initial noise standard deviation for the model.

For rsl-rl >= 5.0.0, this configuration is is deprecated. Please use distribution_cfg instead and use the init_std field of the distribution configuration to specify the initial noise standard deviation.

noise_std_type: Literal['scalar', 'log']#

The type of noise standard deviation for the model. Default is scalar.

For rsl-rl >= 5.0.0, this configuration is is deprecated. Please use distribution_cfg instead and use the std_type field of the distribution configuration to specify the type of noise standard deviation.

state_dependent_std: bool#

Whether to use state-dependent standard deviation for the policy. Default is False.

For rsl-rl >= 5.0.0, this configuration is is deprecated. Please use distribution_cfg instead and use the HeteroscedasticGaussianDistributionCfg if state-dependent standard deviation is desired.

class isaaclab_rl.rsl_rl.RslRlDistillationAlgorithmCfg[source]#

Bases: object

Configuration for the distillation algorithm.

Attributes:

class_name

The algorithm class name.

num_learning_epochs

The number of updates performed with each sample.

learning_rate

The learning rate for the student policy.

gradient_length

The number of environment steps the gradient flows back.

max_grad_norm

The maximum norm the gradient is clipped to.

optimizer

The optimizer to use for the student policy.

loss_type

The loss type to use for the student policy.

Methods:

__init__([class_name, num_learning_epochs, ...])

class_name: str#

The algorithm class name. Default is Distillation.

num_learning_epochs: int#

The number of updates performed with each sample.

learning_rate: float#

The learning rate for the student policy.

gradient_length: int#

The number of environment steps the gradient flows back.

max_grad_norm: None | float#

The maximum norm the gradient is clipped to.

optimizer: Literal['adam', 'adamw', 'sgd', 'rmsprop']#

The optimizer to use for the student policy.

loss_type: Literal['mse', 'huber']#

The loss type to use for the student policy.

__init__(class_name: str = <factory>, num_learning_epochs: int = <factory>, learning_rate: float = <factory>, gradient_length: int = <factory>, max_grad_norm: None | float = <factory>, optimizer: ~typing.Literal['adam', 'adamw', 'sgd', 'rmsprop'] = <factory>, loss_type: ~typing.Literal['mse', 'huber'] = <factory>) None#
class isaaclab_rl.rsl_rl.RslRlDistillationRunnerCfg[source]#

Bases: RslRlBaseRunnerCfg

Configuration of the runner for distillation algorithms.

Attributes:

class_name

The runner class name.

student

The student configuration.

teacher

The teacher configuration.

algorithm

The algorithm configuration.

policy

The policy configuration.

seed

The seed for the experiment.

device

num_steps_per_env

The number of steps per environment per update.

max_iterations

The maximum number of iterations.

empirical_normalization

This parameter is deprecated and will be removed in the future.

obs_groups

A mapping from observation groups to observation sets.

clip_actions

The clipping value for actions.

check_for_nan

Whether to check for NaN values coming from the environment.

save_interval

The number of iterations between saves.

experiment_name

The experiment name.

run_name

The run name.

logger

The logger to use.

neptune_project

The neptune project name.

wandb_project

The wandb project name.

resume

Whether to resume a previous training.

load_run

The run directory to load.

load_checkpoint

The checkpoint file to load.

Methods:

__init__([seed, device, num_steps_per_env, ...])

class_name: str#

The runner class name. Default is DistillationRunner.

student: RslRlMLPModelCfg#

The student configuration.

teacher: RslRlMLPModelCfg#

The teacher configuration.

algorithm: RslRlDistillationAlgorithmCfg#

The algorithm configuration.

policy: RslRlDistillationStudentTeacherCfg#

The policy configuration.

For rsl-rl >= 4.0.0, this configuration is deprecated. Please use student and teacher model configurations instead.

__init__(seed: int = <factory>, device: str = <factory>, num_steps_per_env: int = <factory>, max_iterations: int = <factory>, empirical_normalization: bool = <factory>, obs_groups: dict[str, list[str]] = <factory>, clip_actions: float | None = <factory>, check_for_nan: bool = <factory>, save_interval: int = <factory>, experiment_name: str = <factory>, run_name: str = <factory>, logger: ~typing.Literal['tensorboard', 'neptune', 'wandb'] = <factory>, neptune_project: str = <factory>, wandb_project: str = <factory>, resume: bool = <factory>, load_run: str = <factory>, load_checkpoint: str = <factory>, class_name: str = <factory>, student: ~isaaclab_rl.rsl_rl.rl_cfg.RslRlMLPModelCfg = <factory>, teacher: ~isaaclab_rl.rsl_rl.rl_cfg.RslRlMLPModelCfg = <factory>, algorithm: ~isaaclab_rl.rsl_rl.distillation_cfg.RslRlDistillationAlgorithmCfg = <factory>, policy: ~isaaclab_rl.rsl_rl.distillation_cfg.RslRlDistillationStudentTeacherCfg = <factory>) None#
seed: int#

The seed for the experiment. Default is 42.

device: str#
Type:

The device for the rl-agent. Default is cuda

num_steps_per_env: int#

The number of steps per environment per update.

max_iterations: int#

The maximum number of iterations.

empirical_normalization: bool#

This parameter is deprecated and will be removed in the future.

For rsl-rl < 4.0.0, use actor_obs_normalization and critic_obs_normalization of the policy instead. For rsl-rl >= 4.0.0, use obs_normalization of the model instead.

obs_groups: dict[str, list[str]]#

A mapping from observation groups to observation sets.

The keys of the dictionary are predefined observation sets used by the underlying algorithm and values are lists of observation groups provided by the environment.

For instance, if the environment provides a dictionary of observations with groups “policy”, “images”, and “privileged”, these can be mapped to algorithmic observation sets as follows:

obs_groups = {
    "actor": ["policy", "images"],
    "critic": ["policy", "privileged"],
}

This way, the actor will receive the “policy” and “images” observations, and the critic will receive the “policy” and “privileged” observations.

For more details, please check vec_env.py in the rsl_rl library.

clip_actions: float | None#

The clipping value for actions. If None, then no clipping is done. Defaults to None.

Note

This clipping is performed inside the RslRlVecEnvWrapper wrapper.

check_for_nan: bool#

Whether to check for NaN values coming from the environment.

save_interval: int#

The number of iterations between saves.

experiment_name: str#

The experiment name.

run_name: str#

The run name. Default is empty string.

The name of the run directory is typically the time-stamp at execution. If the run name is not empty, then it is appended to the run directory’s name, i.e. the logging directory’s name will become {time-stamp}_{run_name}.

logger: Literal['tensorboard', 'neptune', 'wandb']#

The logger to use. Default is tensorboard.

neptune_project: str#

The neptune project name. Default is “isaaclab”.

wandb_project: str#

The wandb project name. Default is “isaaclab”.

resume: bool#

Whether to resume a previous training. Default is False.

This flag will be ignored for distillation.

load_run: str#

The run directory to load. Default is “.*” (all).

If regex expression, the latest (alphabetical order) matching run will be loaded.

load_checkpoint: str#

The checkpoint file to load. Default is "model_.*.pt" (all).

If regex expression, the latest (alphabetical order) matching file will be loaded.

class isaaclab_rl.rsl_rl.RslRlDistillationStudentTeacherCfg[source]#

Bases: object

Configuration for the distillation student-teacher networks.

For rsl-rl >= 4.0.0, this configuration is deprecated. Please use RslRlMLPModelCfg instead.

Attributes:

class_name

The policy class name.

init_noise_std

The initial noise standard deviation for the student policy.

noise_std_type

The type of noise standard deviation for the policy.

student_obs_normalization

Whether to normalize the observation for the student network.

teacher_obs_normalization

Whether to normalize the observation for the teacher network.

student_hidden_dims

The hidden dimensions of the student network.

teacher_hidden_dims

The hidden dimensions of the teacher network.

activation

The activation function for the student and teacher networks.

Methods:

__init__([class_name, init_noise_std, ...])

class_name: str#

The policy class name. Default is StudentTeacher.

__init__(class_name: str = <factory>, init_noise_std: float = <factory>, noise_std_type: ~typing.Literal['scalar', 'log'] = <factory>, student_obs_normalization: bool = <factory>, teacher_obs_normalization: bool = <factory>, student_hidden_dims: list[int] = <factory>, teacher_hidden_dims: list[int] = <factory>, activation: str = <factory>) None#
init_noise_std: float#

The initial noise standard deviation for the student policy.

noise_std_type: Literal['scalar', 'log']#

The type of noise standard deviation for the policy. Default is scalar.

student_obs_normalization: bool#

Whether to normalize the observation for the student network.

teacher_obs_normalization: bool#

Whether to normalize the observation for the teacher network.

student_hidden_dims: list[int]#

The hidden dimensions of the student network.

teacher_hidden_dims: list[int]#

The hidden dimensions of the teacher network.

activation: str#

The activation function for the student and teacher networks.

class isaaclab_rl.rsl_rl.RslRlDistillationStudentTeacherRecurrentCfg[source]#

Bases: RslRlDistillationStudentTeacherCfg

Configuration for the distillation student-teacher recurrent networks.

For rsl-rl >= 4.0.0, this configuration is deprecated. Please use RslRlRNNModelCfg instead.

Methods:

__init__([class_name, init_noise_std, ...])

Attributes:

init_noise_std

The initial noise standard deviation for the student policy.

noise_std_type

The type of noise standard deviation for the policy.

student_obs_normalization

Whether to normalize the observation for the student network.

teacher_obs_normalization

Whether to normalize the observation for the teacher network.

student_hidden_dims

The hidden dimensions of the student network.

teacher_hidden_dims

The hidden dimensions of the teacher network.

activation

The activation function for the student and teacher networks.

class_name

The policy class name.

rnn_type

The type of the RNN network.

rnn_hidden_dim

The hidden dimension of the RNN network.

rnn_num_layers

The number of layers of the RNN network.

teacher_recurrent

Whether the teacher network is recurrent too.

__init__(class_name: str = <factory>, init_noise_std: float = <factory>, noise_std_type: ~typing.Literal['scalar', 'log'] = <factory>, student_obs_normalization: bool = <factory>, teacher_obs_normalization: bool = <factory>, student_hidden_dims: list[int] = <factory>, teacher_hidden_dims: list[int] = <factory>, activation: str = <factory>, rnn_type: str = <factory>, rnn_hidden_dim: int = <factory>, rnn_num_layers: int = <factory>, teacher_recurrent: bool = <factory>) None#
init_noise_std: float#

The initial noise standard deviation for the student policy.

noise_std_type: Literal['scalar', 'log']#

The type of noise standard deviation for the policy. Default is scalar.

student_obs_normalization: bool#

Whether to normalize the observation for the student network.

teacher_obs_normalization: bool#

Whether to normalize the observation for the teacher network.

student_hidden_dims: list[int]#

The hidden dimensions of the student network.

teacher_hidden_dims: list[int]#

The hidden dimensions of the teacher network.

activation: str#

The activation function for the student and teacher networks.

class_name: str#

The policy class name. Default is StudentTeacherRecurrent.

rnn_type: str#

The type of the RNN network. Either “lstm” or “gru”.

rnn_hidden_dim: int#

The hidden dimension of the RNN network.

rnn_num_layers: int#

The number of layers of the RNN network.

teacher_recurrent: bool#

Whether the teacher network is recurrent too.

class isaaclab_rl.rsl_rl.RslRlMLPModelCfg[source]#

Bases: object

Configuration for the MLP model.

Attributes:

class_name

The model class name.

hidden_dims

The hidden dimensions of the MLP network.

activation

The activation function for the MLP network.

obs_normalization

Whether to normalize the observation for the model.

distribution_cfg

The configuration for the output distribution.

stochastic

Whether the model output is stochastic.

init_noise_std

The initial noise standard deviation for the model.

noise_std_type

The type of noise standard deviation for the model.

state_dependent_std

Whether to use state-dependent standard deviation for the policy.

Classes:

DistributionCfg

Configuration for the output distribution.

GaussianDistributionCfg

Configuration for the Gaussian output distribution.

HeteroscedasticGaussianDistributionCfg

Configuration for the heteroscedastic Gaussian output distribution.

Methods:

__init__([class_name, hidden_dims, ...])

class_name: str#

The model class name. Default is MLPModel.

hidden_dims: list[int]#

The hidden dimensions of the MLP network.

activation: str#

The activation function for the MLP network.

obs_normalization: bool#

Whether to normalize the observation for the model. Default is False.

distribution_cfg: DistributionCfg | None#

The configuration for the output distribution. Default is None, in which case no distribution is used.

class DistributionCfg[source]#

Bases: object

Configuration for the output distribution.

Attributes:

class_name

The distribution class name.

Methods:

__init__([class_name])

class_name: str#

The distribution class name.

__init__(class_name: str = <factory>) None#
class GaussianDistributionCfg[source]#

Bases: DistributionCfg

Configuration for the Gaussian output distribution.

Attributes:

class_name

The distribution class name.

init_std

The initial standard deviation of the output distribution.

std_type

The parameterization type of the output distribution's standard deviation.

Methods:

__init__([class_name, init_std, std_type])

class_name: str#

The distribution class name. Default is GaussianDistribution.

init_std: float#

The initial standard deviation of the output distribution.

std_type: Literal['scalar', 'log']#

The parameterization type of the output distribution’s standard deviation. Default is scalar.

__init__(class_name: str = <factory>, init_std: float = <factory>, std_type: ~typing.Literal['scalar', 'log'] = <factory>) None#
class HeteroscedasticGaussianDistributionCfg[source]#

Bases: GaussianDistributionCfg

Configuration for the heteroscedastic Gaussian output distribution.

Attributes:

class_name

The distribution class name.

init_std

The initial standard deviation of the output distribution.

std_type

The parameterization type of the output distribution's standard deviation.

Methods:

__init__([class_name, init_std, std_type])

class_name: str#

The distribution class name. Default is HeteroscedasticGaussianDistribution.

__init__(class_name: str = <factory>, init_std: float = <factory>, std_type: ~typing.Literal['scalar', 'log'] = <factory>) None#
init_std: float#

The initial standard deviation of the output distribution.

std_type: Literal['scalar', 'log']#

The parameterization type of the output distribution’s standard deviation. Default is scalar.

stochastic: bool#

Whether the model output is stochastic.

For rsl-rl >= 5.0.0, this configuration is is deprecated. Please use distribution_cfg instead and set it to None for deterministic output or to a valid configuration class, e.g., GaussianDistributionCfg for stochastic output.

init_noise_std: float#

The initial noise standard deviation for the model.

For rsl-rl >= 5.0.0, this configuration is is deprecated. Please use distribution_cfg instead and use the init_std field of the distribution configuration to specify the initial noise standard deviation.

noise_std_type: Literal['scalar', 'log']#

The type of noise standard deviation for the model. Default is scalar.

For rsl-rl >= 5.0.0, this configuration is is deprecated. Please use distribution_cfg instead and use the std_type field of the distribution configuration to specify the type of noise standard deviation.

state_dependent_std: bool#

Whether to use state-dependent standard deviation for the policy. Default is False.

For rsl-rl >= 5.0.0, this configuration is is deprecated. Please use distribution_cfg instead and use the HeteroscedasticGaussianDistributionCfg if state-dependent standard deviation is desired.

__init__(class_name: str = <factory>, hidden_dims: list[int] = <factory>, activation: str = <factory>, obs_normalization: bool = <factory>, distribution_cfg: DistributionCfg | None = <factory>, stochastic: bool = <factory>, init_noise_std: float = <factory>, noise_std_type: Literal['scalar', 'log'] = <factory>, state_dependent_std: bool = <factory>) None#
class isaaclab_rl.rsl_rl.RslRlOnPolicyRunnerCfg[source]#

Bases: RslRlBaseRunnerCfg

Configuration of the runner for on-policy algorithms.

Methods:

__init__([seed, device, num_steps_per_env, ...])

Attributes:

seed

The seed for the experiment.

device

num_steps_per_env

The number of steps per environment per update.

max_iterations

The maximum number of iterations.

empirical_normalization

This parameter is deprecated and will be removed in the future.

obs_groups

A mapping from observation groups to observation sets.

clip_actions

The clipping value for actions.

check_for_nan

Whether to check for NaN values coming from the environment.

save_interval

The number of iterations between saves.

experiment_name

The experiment name.

run_name

The run name.

logger

The logger to use.

neptune_project

The neptune project name.

wandb_project

The wandb project name.

resume

Whether to resume a previous training.

load_run

The run directory to load.

load_checkpoint

The checkpoint file to load.

class_name

The runner class name.

actor

The actor configuration.

critic

The critic configuration.

algorithm

The algorithm configuration.

policy

The policy configuration.

__init__(seed: int = <factory>, device: str = <factory>, num_steps_per_env: int = <factory>, max_iterations: int = <factory>, empirical_normalization: bool = <factory>, obs_groups: dict[str, list[str]] = <factory>, clip_actions: float | None = <factory>, check_for_nan: bool = <factory>, save_interval: int = <factory>, experiment_name: str = <factory>, run_name: str = <factory>, logger: ~typing.Literal['tensorboard', 'neptune', 'wandb'] = <factory>, neptune_project: str = <factory>, wandb_project: str = <factory>, resume: bool = <factory>, load_run: str = <factory>, load_checkpoint: str = <factory>, class_name: str = <factory>, actor: ~isaaclab_rl.rsl_rl.rl_cfg.RslRlMLPModelCfg = <factory>, critic: ~isaaclab_rl.rsl_rl.rl_cfg.RslRlMLPModelCfg = <factory>, algorithm: ~isaaclab_rl.rsl_rl.rl_cfg.RslRlPpoAlgorithmCfg = <factory>, policy: ~isaaclab_rl.rsl_rl.rl_cfg.RslRlPpoActorCriticCfg = <factory>) None#
seed: int#

The seed for the experiment. Default is 42.

device: str#
Type:

The device for the rl-agent. Default is cuda

num_steps_per_env: int#

The number of steps per environment per update.

max_iterations: int#

The maximum number of iterations.

empirical_normalization: bool#

This parameter is deprecated and will be removed in the future.

For rsl-rl < 4.0.0, use actor_obs_normalization and critic_obs_normalization of the policy instead. For rsl-rl >= 4.0.0, use obs_normalization of the model instead.

obs_groups: dict[str, list[str]]#

A mapping from observation groups to observation sets.

The keys of the dictionary are predefined observation sets used by the underlying algorithm and values are lists of observation groups provided by the environment.

For instance, if the environment provides a dictionary of observations with groups “policy”, “images”, and “privileged”, these can be mapped to algorithmic observation sets as follows:

obs_groups = {
    "actor": ["policy", "images"],
    "critic": ["policy", "privileged"],
}

This way, the actor will receive the “policy” and “images” observations, and the critic will receive the “policy” and “privileged” observations.

For more details, please check vec_env.py in the rsl_rl library.

clip_actions: float | None#

The clipping value for actions. If None, then no clipping is done. Defaults to None.

Note

This clipping is performed inside the RslRlVecEnvWrapper wrapper.

check_for_nan: bool#

Whether to check for NaN values coming from the environment.

save_interval: int#

The number of iterations between saves.

experiment_name: str#

The experiment name.

run_name: str#

The run name. Default is empty string.

The name of the run directory is typically the time-stamp at execution. If the run name is not empty, then it is appended to the run directory’s name, i.e. the logging directory’s name will become {time-stamp}_{run_name}.

logger: Literal['tensorboard', 'neptune', 'wandb']#

The logger to use. Default is tensorboard.

neptune_project: str#

The neptune project name. Default is “isaaclab”.

wandb_project: str#

The wandb project name. Default is “isaaclab”.

resume: bool#

Whether to resume a previous training. Default is False.

This flag will be ignored for distillation.

load_run: str#

The run directory to load. Default is “.*” (all).

If regex expression, the latest (alphabetical order) matching run will be loaded.

load_checkpoint: str#

The checkpoint file to load. Default is "model_.*.pt" (all).

If regex expression, the latest (alphabetical order) matching file will be loaded.

class_name: str#

The runner class name. Default is OnPolicyRunner.

actor: RslRlMLPModelCfg#

The actor configuration.

critic: RslRlMLPModelCfg#

The critic configuration.

algorithm: RslRlPpoAlgorithmCfg#

The algorithm configuration.

policy: RslRlPpoActorCriticCfg#

The policy configuration.

For rsl-rl >= 4.0.0, this configuration is is deprecated. Please use actor and critic model configurations instead.

class isaaclab_rl.rsl_rl.RslRlPpoActorCriticCfg[source]#

Bases: object

Configuration for the PPO actor-critic networks.

For rsl-rl >= 4.0.0, this configuration is deprecated. Please use RslRlMLPModelCfg instead.

Methods:

__init__([class_name, init_noise_std, ...])

Attributes:

class_name

The policy class name.

init_noise_std

The initial noise standard deviation for the policy.

noise_std_type

The type of noise standard deviation for the policy.

state_dependent_std

Whether to use state-dependent standard deviation for the policy.

actor_obs_normalization

Whether to normalize the observation for the actor network.

critic_obs_normalization

Whether to normalize the observation for the critic network.

actor_hidden_dims

The hidden dimensions of the actor network.

critic_hidden_dims

The hidden dimensions of the critic network.

activation

The activation function for the actor and critic networks.

__init__(class_name: str = <factory>, init_noise_std: float = <factory>, noise_std_type: ~typing.Literal['scalar', 'log'] = <factory>, state_dependent_std: bool = <factory>, actor_obs_normalization: bool = <factory>, critic_obs_normalization: bool = <factory>, actor_hidden_dims: list[int] = <factory>, critic_hidden_dims: list[int] = <factory>, activation: str = <factory>) None#
class_name: str#

The policy class name. Default is ActorCritic.

init_noise_std: float#

The initial noise standard deviation for the policy.

noise_std_type: Literal['scalar', 'log']#

The type of noise standard deviation for the policy. Default is scalar.

state_dependent_std: bool#

Whether to use state-dependent standard deviation for the policy. Default is False.

actor_obs_normalization: bool#

Whether to normalize the observation for the actor network.

critic_obs_normalization: bool#

Whether to normalize the observation for the critic network.

actor_hidden_dims: list[int]#

The hidden dimensions of the actor network.

critic_hidden_dims: list[int]#

The hidden dimensions of the critic network.

activation: str#

The activation function for the actor and critic networks.

class isaaclab_rl.rsl_rl.RslRlPpoActorCriticRecurrentCfg[source]#

Bases: RslRlPpoActorCriticCfg

Configuration for the PPO actor-critic networks with recurrent layers.

For rsl-rl >= 4.0.0, this configuration is deprecated. Please use RslRlRNNModelCfg instead.

Methods:

__init__([class_name, init_noise_std, ...])

Attributes:

init_noise_std

The initial noise standard deviation for the policy.

noise_std_type

The type of noise standard deviation for the policy.

state_dependent_std

Whether to use state-dependent standard deviation for the policy.

actor_obs_normalization

Whether to normalize the observation for the actor network.

critic_obs_normalization

Whether to normalize the observation for the critic network.

actor_hidden_dims

The hidden dimensions of the actor network.

critic_hidden_dims

The hidden dimensions of the critic network.

activation

The activation function for the actor and critic networks.

class_name

The policy class name.

rnn_type

The type of RNN to use.

rnn_hidden_dim

The dimension of the RNN layers.

rnn_num_layers

The number of RNN layers.

__init__(class_name: str = <factory>, init_noise_std: float = <factory>, noise_std_type: ~typing.Literal['scalar', 'log'] = <factory>, state_dependent_std: bool = <factory>, actor_obs_normalization: bool = <factory>, critic_obs_normalization: bool = <factory>, actor_hidden_dims: list[int] = <factory>, critic_hidden_dims: list[int] = <factory>, activation: str = <factory>, rnn_type: str = <factory>, rnn_hidden_dim: int = <factory>, rnn_num_layers: int = <factory>) None#
init_noise_std: float#

The initial noise standard deviation for the policy.

noise_std_type: Literal['scalar', 'log']#

The type of noise standard deviation for the policy. Default is scalar.

state_dependent_std: bool#

Whether to use state-dependent standard deviation for the policy. Default is False.

actor_obs_normalization: bool#

Whether to normalize the observation for the actor network.

critic_obs_normalization: bool#

Whether to normalize the observation for the critic network.

actor_hidden_dims: list[int]#

The hidden dimensions of the actor network.

critic_hidden_dims: list[int]#

The hidden dimensions of the critic network.

activation: str#

The activation function for the actor and critic networks.

class_name: str#

The policy class name. Default is ActorCriticRecurrent.

rnn_type: str#

The type of RNN to use. Either “lstm” or “gru”.

rnn_hidden_dim: int#

The dimension of the RNN layers.

rnn_num_layers: int#

The number of RNN layers.

class isaaclab_rl.rsl_rl.RslRlPpoAlgorithmCfg[source]#

Bases: object

Configuration for the PPO algorithm.

Attributes:

class_name

The algorithm class name.

num_learning_epochs

The number of learning epochs per update.

num_mini_batches

The number of mini-batches per update.

learning_rate

The learning rate for the policy.

schedule

The learning rate schedule.

gamma

The discount factor.

lam

The lambda parameter for Generalized Advantage Estimation (GAE).

entropy_coef

The coefficient for the entropy loss.

desired_kl

The desired KL divergence.

max_grad_norm

The maximum gradient norm.

optimizer

The optimizer to use.

value_loss_coef

The coefficient for the value loss.

use_clipped_value_loss

Whether to use clipped value loss.

clip_param

The clipping parameter for the policy.

normalize_advantage_per_mini_batch

Whether to normalize the advantage per mini-batch.

share_cnn_encoders

Whether to share the CNN networks between actor and critic, in case CNNModels are used.

rnd_cfg

The RND configuration.

symmetry_cfg

The symmetry configuration.

Methods:

__init__([class_name, num_learning_epochs, ...])

class_name: str#

The algorithm class name. Default is PPO.

num_learning_epochs: int#

The number of learning epochs per update.

num_mini_batches: int#

The number of mini-batches per update.

learning_rate: float#

The learning rate for the policy.

schedule: str#

The learning rate schedule.

gamma: float#

The discount factor.

lam: float#

The lambda parameter for Generalized Advantage Estimation (GAE).

entropy_coef: float#

The coefficient for the entropy loss.

desired_kl: float#

The desired KL divergence.

max_grad_norm: float#

The maximum gradient norm.

optimizer: Literal['adam', 'adamw', 'sgd', 'rmsprop']#

The optimizer to use.

value_loss_coef: float#

The coefficient for the value loss.

use_clipped_value_loss: bool#

Whether to use clipped value loss.

__init__(class_name: str = <factory>, num_learning_epochs: int = <factory>, num_mini_batches: int = <factory>, learning_rate: float = <factory>, schedule: str = <factory>, gamma: float = <factory>, lam: float = <factory>, entropy_coef: float = <factory>, desired_kl: float = <factory>, max_grad_norm: float = <factory>, optimizer: ~typing.Literal['adam', 'adamw', 'sgd', 'rmsprop'] = <factory>, value_loss_coef: float = <factory>, use_clipped_value_loss: bool = <factory>, clip_param: float = <factory>, normalize_advantage_per_mini_batch: bool = <factory>, share_cnn_encoders: bool = <factory>, rnd_cfg: ~isaaclab_rl.rsl_rl.rnd_cfg.RslRlRndCfg | None = <factory>, symmetry_cfg: ~isaaclab_rl.rsl_rl.symmetry_cfg.RslRlSymmetryCfg | None = <factory>) None#
clip_param: float#

The clipping parameter for the policy.

normalize_advantage_per_mini_batch: bool#

Whether to normalize the advantage per mini-batch. Default is False.

If True, the advantage is normalized over the mini-batches only. Otherwise, the advantage is normalized over the entire collected trajectories.

share_cnn_encoders: bool#

Whether to share the CNN networks between actor and critic, in case CNNModels are used. Defaults to False.

rnd_cfg: RslRlRndCfg | None#

The RND configuration. Default is None, in which case RND is not used.

symmetry_cfg: RslRlSymmetryCfg | None#

The symmetry configuration. Default is None, in which case symmetry is not used.

class isaaclab_rl.rsl_rl.RslRlRNNModelCfg[source]#

Bases: RslRlMLPModelCfg

Configuration for RNN model.

Attributes:

class_name

The model class name.

rnn_type

The type of RNN to use.

rnn_hidden_dim

The dimension of the RNN layers.

rnn_num_layers

The number of RNN layers.

hidden_dims

The hidden dimensions of the MLP network.

activation

The activation function for the MLP network.

obs_normalization

Whether to normalize the observation for the model.

distribution_cfg

The configuration for the output distribution.

stochastic

Whether the model output is stochastic.

init_noise_std

The initial noise standard deviation for the model.

noise_std_type

The type of noise standard deviation for the model.

state_dependent_std

Whether to use state-dependent standard deviation for the policy.

Methods:

__init__([class_name, hidden_dims, ...])

class_name: str#

The model class name. Default is RNNModel.

rnn_type: str#

The type of RNN to use. Either “lstm” or “gru”.

rnn_hidden_dim: int#

The dimension of the RNN layers.

rnn_num_layers: int#

The number of RNN layers.

__init__(class_name: str = <factory>, hidden_dims: list[int] = <factory>, activation: str = <factory>, obs_normalization: bool = <factory>, distribution_cfg: DistributionCfg | None = <factory>, stochastic: bool = <factory>, init_noise_std: float = <factory>, noise_std_type: Literal['scalar', 'log'] = <factory>, state_dependent_std: bool = <factory>, rnn_type: str = <factory>, rnn_hidden_dim: int = <factory>, rnn_num_layers: int = <factory>) None#
hidden_dims: list[int]#

The hidden dimensions of the MLP network.

activation: str#

The activation function for the MLP network.

obs_normalization: bool#

Whether to normalize the observation for the model. Default is False.

distribution_cfg: DistributionCfg | None#

The configuration for the output distribution. Default is None, in which case no distribution is used.

stochastic: bool#

Whether the model output is stochastic.

For rsl-rl >= 5.0.0, this configuration is is deprecated. Please use distribution_cfg instead and set it to None for deterministic output or to a valid configuration class, e.g., GaussianDistributionCfg for stochastic output.

init_noise_std: float#

The initial noise standard deviation for the model.

For rsl-rl >= 5.0.0, this configuration is is deprecated. Please use distribution_cfg instead and use the init_std field of the distribution configuration to specify the initial noise standard deviation.

noise_std_type: Literal['scalar', 'log']#

The type of noise standard deviation for the model. Default is scalar.

For rsl-rl >= 5.0.0, this configuration is is deprecated. Please use distribution_cfg instead and use the std_type field of the distribution configuration to specify the type of noise standard deviation.

state_dependent_std: bool#

Whether to use state-dependent standard deviation for the policy. Default is False.

For rsl-rl >= 5.0.0, this configuration is is deprecated. Please use distribution_cfg instead and use the HeteroscedasticGaussianDistributionCfg if state-dependent standard deviation is desired.

isaaclab_rl.rsl_rl.configclass(cls, **kwargs)[source]#

Wrapper around dataclass functionality to add extra checks and utilities.

As of Python 3.7, the standard dataclasses have two main issues which makes them non-generic for configuration use-cases. These include:

  1. Requiring a type annotation for all its members.

  2. Requiring explicit usage of field(default_factory=...)() to reinitialize mutable variables.

This function provides a decorator that wraps around Python’s dataclass utility to deal with the above two issues. It also provides additional helper functions for dictionary <-> class conversion and easily copying class instances.

Usage:

from dataclasses import MISSING

from isaaclab.utils.configclass import configclass


@configclass
class ViewerCfg:
    eye: list = [7.5, 7.5, 7.5]  # field missing on purpose
    lookat: list = field(default_factory=[0.0, 0.0, 0.0])


@configclass
class EnvCfg:
    num_envs: int = MISSING
    episode_length: int = 2000
    viewer: ViewerCfg = ViewerCfg()


# create configuration instance
env_cfg = EnvCfg(num_envs=24)

# print information as a dictionary
print(env_cfg.to_dict())

# create a copy of the configuration
env_cfg_copy = env_cfg.copy()

# replace arbitrary fields using keyword arguments
env_cfg_copy = env_cfg_copy.replace(num_envs=32)
Parameters:
  • cls – The class to wrap around.

  • **kwargs – Additional arguments to pass to dataclass().

Returns:

The wrapped class.

isaaclab_rl.rsl_rl.handle_deprecated_rsl_rl_cfg(agent_cfg: RslRlBaseRunnerCfg, installed_version) RslRlBaseRunnerCfg[source]#

Handle deprecated RSL-RL configurations across version boundaries.

This function mutates agent_cfg to keep configurations compatible with the installed rsl-rl version:

  • For rsl-rl < 4.0.0, policy is required; new model configs (actor, critic, student,

    teacher) are ignored and cleared.

  • For rsl-rl >= 4.0.0, deprecated policy can be used to infer missing model configs, then policy is

    cleared.

  • For rsl-rl >= 5.0.0, legacy stochastic parameters are migrated to distribution_cfg when needed; for

    4.0.0 <= rsl-rl < 5.0.0, those legacy parameters are validated instead.

Raises:

ValueError – If required legacy parameters are missing for the selected rsl-rl version.

SKRL Wrapper#

Wrapper to configure an environment instance to skrl environment.

The following example shows how to wrap an environment for skrl:

from isaaclab_rl.skrl import SkrlVecEnvWrapper

env = SkrlVecEnvWrapper(env, ml_framework="torch")  # or ml_framework="jax"

Or, equivalently, by directly calling the skrl library API as follows:

from skrl.envs.torch.wrappers import wrap_env  # for PyTorch, or...
from skrl.envs.jax.wrappers import wrap_env  # for JAX

env = wrap_env(env, wrapper="isaaclab")

Functions:

SkrlVecEnvWrapper(env[, ml_framework, wrapper])

Wraps around Isaac Lab environment for skrl.

isaaclab_rl.skrl.SkrlVecEnvWrapper(env: ManagerBasedRLEnv | DirectRLEnv | DirectMARLEnv, ml_framework: Literal['torch', 'jax', 'jax-numpy'] = 'torch', wrapper: Literal['auto', 'isaaclab', 'isaaclab-single-agent', 'isaaclab-multi-agent'] = 'isaaclab')[source]#

Wraps around Isaac Lab environment for skrl.

This function wraps around the Isaac Lab environment. Since the wrapping functionality is defined within the skrl library itself, this implementation is maintained for compatibility with the structure of the extension that contains it. Internally it calls the wrap_env() from the skrl library API.

Parameters:
  • env – The environment to wrap around.

  • ml_framework – The ML framework to use for the wrapper. Defaults to “torch”.

  • wrapper – The wrapper to use. Defaults to “isaaclab”: leave it to skrl to determine if the environment will be wrapped as single-agent or multi-agent.

Raises:
  • ValueError – When the environment is not an instance of any Isaac Lab environment interface.

  • ValueError – If the specified ML framework is not valid.

Reference:

https://skrl.readthedocs.io/en/latest/api/envs/wrapping.html

Stable-Baselines3 Wrapper#

Wrapper to configure an environment instance to Stable-Baselines3 vectorized environment.

The following example shows how to wrap an environment for Stable-Baselines3:

from isaaclab_rl.sb3 import Sb3VecEnvWrapper

env = Sb3VecEnvWrapper(env)

Functions:

process_sb3_cfg(cfg, num_envs)

Convert simple YAML types to Stable-Baselines classes/components.

Classes:

Sb3VecEnvWrapper

Wraps around Isaac Lab environment for Stable Baselines3.

isaaclab_rl.sb3.process_sb3_cfg(cfg: dict, num_envs: int) dict[source]#

Convert simple YAML types to Stable-Baselines classes/components.

Parameters:
  • cfg – A configuration dictionary.

  • num_envs – the number of parallel environments (used to compute batch_size for a desired number of minibatches)

Returns:

A dictionary containing the converted configuration.

Reference:

DLR-RM/rl-baselines3-zoo

class isaaclab_rl.sb3.Sb3VecEnvWrapper[source]#

Bases: VecEnv

Wraps around Isaac Lab environment for Stable Baselines3.

Isaac Sim internally implements a vectorized environment. However, since it is still considered a single environment instance, Stable Baselines tries to wrap around it using the DummyVecEnv. This is only done if the environment is not inheriting from their VecEnv. Thus, this class thinly wraps over the environment from ManagerBasedRLEnv or DirectRLEnv.

Note

While Stable-Baselines3 supports Gym 0.26+ API, their vectorized environment uses their own API (i.e. it is closer to Gym 0.21). Thus, we implement the API for the vectorized environment.

We also add monitoring functionality that computes the un-discounted episode return and length. This information is added to the info dicts under key episode.

In contrast to the Isaac Lab environment, stable-baselines expect the following:

  1. numpy datatype for MDP signals

  2. a list of info dicts for each sub-environment (instead of a dict)

  3. when environment has terminated, the observations from the environment should correspond to the one after reset. The “real” final observation is passed using the info dicts under the key terminal_observation.

Warning

By the nature of physics stepping in Isaac Sim, it is not possible to forward the simulation buffers without performing a physics step. Thus, reset is performed inside the step() function after the actual physics step is taken. Thus, the returned observations for terminated environments is the one after the reset.

Caution

This class must be the last wrapper in the wrapper chain. This is because the wrapper does not follow the gym.Wrapper interface. Any subsequent wrappers will need to be modified to work with this wrapper.

Reference:

  1. https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html

  2. https://stable-baselines3.readthedocs.io/en/master/common/monitor.html

Methods:

__init__(env[, fast_variant])

Initialize the wrapper.

class_name()

Returns the class name of the wrapper.

get_episode_rewards()

Returns the rewards of all the episodes.

get_episode_lengths()

Returns the number of time-steps of all the episodes.

Attributes:

unwrapped

Returns the base environment of the wrapper.

__init__(env: ManagerBasedRLEnv | DirectRLEnv, fast_variant: bool = True)[source]#

Initialize the wrapper.

Parameters:
  • env – The environment to wrap around.

  • fast_variant – Use fast variant for processing info (Only episodic reward, lengths and truncation info are included)

Raises:

ValueError – When the environment is not an instance of ManagerBasedRLEnv or DirectRLEnv.

classmethod class_name() str[source]#

Returns the class name of the wrapper.

property unwrapped: ManagerBasedRLEnv | DirectRLEnv#

Returns the base environment of the wrapper.

This will be the bare gymnasium.Env environment, underneath all layers of wrappers.

get_episode_rewards() list[float][source]#

Returns the rewards of all the episodes.

get_episode_lengths() list[int][source]#

Returns the number of time-steps of all the episodes.