Task Design Workflows

Task Design Workflows#

A Task is defined by an environment with specific interfaces for observations to and actions from a specific agent (robot). The environment is what provides an agent with the current observations and executes that agent’s actions by updating the simulation forward in time. There are many common components of simulating a robot in an environment, regardless of what you might want that robot to do or how it might be trained to do it.

This is especially true of Reinforcement Learning (RL), where managing the actions, observations, rewards, etc… across a vectorized GPU simulation can be daunting to even think about! To meet this need, Isaac Lab provides the ability to build your RL environments within our Manager-based system, allowing you to trust various minutia of the appropriate manager classes. However, we also recognize the need to exert granular control over an environment, especially during development. For this need, we also provide a Direct interface into the simulation, giving you full control!

  • Manager-based: The environment is decomposed into individual components (or managers) that handle different aspects of the environment (such as computing observations, applying actions, and applying randomization). The user defines configuration classes for each component and the environment is responsible for coordinating the managers and calling their functions.

  • Direct: The user defines a single class that implements the entire environment directly without the need for separate managers. This class is responsible for computing observations, applying actions, and computing rewards.

Both workflows have their own advantages and disadvantages. The manager-based workflow is more modular and allows different components of the environment to be swapped out easily. This is useful when prototyping the environment and experimenting with different configurations. On the other hand, the direct workflow is more efficient and allows for more fine-grained control over the environment logic. This is useful when optimizing the environment for performance or when implementing complex logic that is difficult to decompose into separate components.

Manager-Based Environments#

Manager-based Task WorkflowManager-based Task Workflow

Manager-based environments promote modular implementations of tasks by decomposing it into individually managed components. Each component of the task, such as calculating rewards, observations, etc… can be specified as configurations for a corresponding manager. These managers define configurable functions that are responsible for executing the specific computations as needed. Coordinating a collection of different managers is handled by an Environment class that inherits from envs.ManagerBasedEnv. Configurations likewise must all inherit from envs.ManagerBasedEnvCfg.

When developing new training environments, it is often beneficial to break the environment into independent components. This can be highly effective for collaboration, as it lets individual developers focus on different aspects of the environment, while allowing those disparate efforts to be joined back together into a single runnable task. For example, you may have multiple robots with differing sensoriums, requiring different observation managers to process those sensory data into a form that’s useful for downstream components. You might have multiple members on the team with different ideas about what the reward should be to achieve your goals, and by having each one develop their own reward manager, you can swap and test as you see fit. The modular nature of the manager workflow is essential for more complex projects!

For reinforcement learning, much of this has been done for you already! In most cases, it will be enough to write your environment to inherit from envs.ManagerBasedRLEnv and and your configuration from envs.ManagerBasedRLEnvCfg.

Example for defining the reward function for the Cartpole task using the manager-style

The following class is a part of the Cartpole environment configuration class. The RewardsCfg class defines individual terms that compose the reward function. Each reward term is defined by its function implementation, weight and additional parameters to be passed to the function. Users can define multiple reward terms and their weights to be used in the reward function.

@configclass
class RewardsCfg:
    """Reward terms for the MDP."""

    # (1) Constant running reward
    alive = RewTerm(func=mdp.is_alive, weight=1.0)
    # (2) Failure penalty
    terminating = RewTerm(func=mdp.is_terminated, weight=-2.0)
    # (3) Primary task: keep pole upright
    pole_pos = RewTerm(
        func=mdp.joint_pos_target_l2,
        weight=-1.0,
        params={"asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"]), "target": 0.0},
    )
    # (4) Shaping tasks: lower cart velocity
    cart_vel = RewTerm(
        func=mdp.joint_vel_l1,
        weight=-0.01,
        params={"asset_cfg": SceneEntityCfg("robot", joint_names=["slider_to_cart"])},
    )
    # (5) Shaping tasks: lower pole angular velocity
    pole_vel = RewTerm(
        func=mdp.joint_vel_l1,
        weight=-0.005,
        params={"asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"])},
    )

See also

We provide a more detailed tutorial for setting up an environment using the manager-based workflow at Creating a Manager-Based RL Environment.

Direct Environments#

Direct-based Task WorkflowDirect-based Task Workflow

The direct-style environment aligns more closely with traditional implementations of environments from other libraries. A single class implements the reward function, observation function, resets, and all the other components of the environment. This approach does not require the manager classes. Instead, users are provided the complete freedom to implement their task through the APIs of either envs.DirectRLEnv or envs.DirectMARLEnv. All direct task environments must inherit from one of these two classes. Direct environments still require configurations to be defined, specifically by inheriting from either envs.DirectRLEnvCfg or envs.DirectMARLEnvCfg. This workflow may be the most familiar for users migrating from the IsaacGymEnvs and OmniIsaacGymEnvs frameworks.

Example for defining the reward function for the Cartpole task using the direct-style

The following function is a part of the Cartpole environment class and is responsible for computing the rewards.

def _get_rewards(self) -> torch.Tensor:
    total_reward = compute_rewards(
        self.cfg.rew_scale_alive,
        self.cfg.rew_scale_terminated,
        self.cfg.rew_scale_pole_pos,
        self.cfg.rew_scale_cart_vel,
        self.cfg.rew_scale_pole_vel,
        self.joint_pos[:, self._pole_dof_idx[0]],
        self.joint_vel[:, self._pole_dof_idx[0]],
        self.joint_pos[:, self._cart_dof_idx[0]],
        self.joint_vel[:, self._cart_dof_idx[0]],
        self.reset_terminated,
    )
    return total_reward

It calls the compute_rewards() function which is Torch JIT compiled for performance benefits.

@torch.jit.script
def compute_rewards(
    rew_scale_alive: float,
    rew_scale_terminated: float,
    rew_scale_pole_pos: float,
    rew_scale_cart_vel: float,
    rew_scale_pole_vel: float,
    pole_pos: torch.Tensor,
    pole_vel: torch.Tensor,
    cart_pos: torch.Tensor,
    cart_vel: torch.Tensor,
    reset_terminated: torch.Tensor,
):
    rew_alive = rew_scale_alive * (1.0 - reset_terminated.float())
    rew_termination = rew_scale_terminated * reset_terminated.float()
    rew_pole_pos = rew_scale_pole_pos * torch.sum(torch.square(pole_pos).unsqueeze(dim=1), dim=-1)
    rew_cart_vel = rew_scale_cart_vel * torch.sum(torch.abs(cart_vel).unsqueeze(dim=1), dim=-1)
    rew_pole_vel = rew_scale_pole_vel * torch.sum(torch.abs(pole_vel).unsqueeze(dim=1), dim=-1)
    total_reward = rew_alive + rew_termination + rew_pole_pos + rew_cart_vel + rew_pole_vel
    return total_reward

This approach provides more transparency in the implementations of the environments, as logic is defined within the task class instead of abstracted with the use of managers. This may be beneficial when implementing complex logic that is difficult to decompose into separate components. Additionally, the direct-style implementation may bring more performance benefits for the environment, as it allows implementing large chunks of logic with optimized frameworks such as PyTorch JIT or Warp. This may be valuable when scaling up training tremendously which requires optimizing individual operations in the environment.

See also

We provide a more detailed tutorial for setting up a RL environment using the direct workflow at Creating a Direct Workflow RL Environment.