Creating a Manager-Based Base Environment#
Environments bring together different aspects of the simulation such as
the scene, observations and actions spaces, reset events etc. to create a
coherent interface for various applications. In Isaac Lab, manager-based environments are
implemented as envs.ManagerBasedEnv
and envs.ManagerBasedRLEnv
classes.
The two classes are very similar, but envs.ManagerBasedRLEnv
is useful for
reinforcement learning tasks and contains rewards, terminations, curriculum
and command generation. The envs.ManagerBasedEnv
class is useful for
traditional robot control and doesn’t contain rewards and terminations.
In this tutorial, we will look at the base class envs.ManagerBasedEnv
and its
corresponding configuration class envs.ManagerBasedEnvCfg
for the manager-based workflow.
We will use the
cartpole environment from earlier to illustrate the different components
in creating a new envs.ManagerBasedEnv
environment.
The Code#
The tutorial corresponds to the create_cartpole_base_env
script in the source/standalone/tutorials/03_envs
directory.
Code for create_cartpole_base_env.py
1# Copyright (c) 2022-2024, The Isaac Lab Project Developers.
2# All rights reserved.
3#
4# SPDX-License-Identifier: BSD-3-Clause
5
6"""
7This script demonstrates how to create a simple environment with a cartpole. It combines the concepts of
8scene, action, observation and event managers to create an environment.
9"""
10
11"""Launch Isaac Sim Simulator first."""
12
13
14import argparse
15
16from omni.isaac.lab.app import AppLauncher
17
18# add argparse arguments
19parser = argparse.ArgumentParser(description="Tutorial on creating a cartpole base environment.")
20parser.add_argument("--num_envs", type=int, default=16, help="Number of environments to spawn.")
21
22# append AppLauncher cli args
23AppLauncher.add_app_launcher_args(parser)
24# parse the arguments
25args_cli = parser.parse_args()
26
27# launch omniverse app
28app_launcher = AppLauncher(args_cli)
29simulation_app = app_launcher.app
30
31"""Rest everything follows."""
32
33import math
34import torch
35
36import omni.isaac.lab.envs.mdp as mdp
37from omni.isaac.lab.envs import ManagerBasedEnv, ManagerBasedEnvCfg
38from omni.isaac.lab.managers import EventTermCfg as EventTerm
39from omni.isaac.lab.managers import ObservationGroupCfg as ObsGroup
40from omni.isaac.lab.managers import ObservationTermCfg as ObsTerm
41from omni.isaac.lab.managers import SceneEntityCfg
42from omni.isaac.lab.utils import configclass
43
44from omni.isaac.lab_tasks.manager_based.classic.cartpole.cartpole_env_cfg import CartpoleSceneCfg
45
46
47@configclass
48class ActionsCfg:
49 """Action specifications for the environment."""
50
51 joint_efforts = mdp.JointEffortActionCfg(asset_name="robot", joint_names=["slider_to_cart"], scale=5.0)
52
53
54@configclass
55class ObservationsCfg:
56 """Observation specifications for the environment."""
57
58 @configclass
59 class PolicyCfg(ObsGroup):
60 """Observations for policy group."""
61
62 # observation terms (order preserved)
63 joint_pos_rel = ObsTerm(func=mdp.joint_pos_rel)
64 joint_vel_rel = ObsTerm(func=mdp.joint_vel_rel)
65
66 def __post_init__(self) -> None:
67 self.enable_corruption = False
68 self.concatenate_terms = True
69
70 # observation groups
71 policy: PolicyCfg = PolicyCfg()
72
73
74@configclass
75class EventCfg:
76 """Configuration for events."""
77
78 # on startup
79 add_pole_mass = EventTerm(
80 func=mdp.randomize_rigid_body_mass,
81 mode="startup",
82 params={
83 "asset_cfg": SceneEntityCfg("robot", body_names=["pole"]),
84 "mass_distribution_params": (0.1, 0.5),
85 "operation": "add",
86 },
87 )
88
89 # on reset
90 reset_cart_position = EventTerm(
91 func=mdp.reset_joints_by_offset,
92 mode="reset",
93 params={
94 "asset_cfg": SceneEntityCfg("robot", joint_names=["slider_to_cart"]),
95 "position_range": (-1.0, 1.0),
96 "velocity_range": (-0.1, 0.1),
97 },
98 )
99
100 reset_pole_position = EventTerm(
101 func=mdp.reset_joints_by_offset,
102 mode="reset",
103 params={
104 "asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"]),
105 "position_range": (-0.125 * math.pi, 0.125 * math.pi),
106 "velocity_range": (-0.01 * math.pi, 0.01 * math.pi),
107 },
108 )
109
110
111@configclass
112class CartpoleEnvCfg(ManagerBasedEnvCfg):
113 """Configuration for the cartpole environment."""
114
115 # Scene settings
116 scene = CartpoleSceneCfg(num_envs=1024, env_spacing=2.5)
117 # Basic settings
118 observations = ObservationsCfg()
119 actions = ActionsCfg()
120 events = EventCfg()
121
122 def __post_init__(self):
123 """Post initialization."""
124 # viewer settings
125 self.viewer.eye = [4.5, 0.0, 6.0]
126 self.viewer.lookat = [0.0, 0.0, 2.0]
127 # step settings
128 self.decimation = 4 # env step every 4 sim steps: 200Hz / 4 = 50Hz
129 # simulation settings
130 self.sim.dt = 0.005 # sim step every 5ms: 200Hz
131
132
133def main():
134 """Main function."""
135 # parse the arguments
136 env_cfg = CartpoleEnvCfg()
137 env_cfg.scene.num_envs = args_cli.num_envs
138 # setup base environment
139 env = ManagerBasedEnv(cfg=env_cfg)
140
141 # simulate physics
142 count = 0
143 while simulation_app.is_running():
144 with torch.inference_mode():
145 # reset
146 if count % 300 == 0:
147 count = 0
148 env.reset()
149 print("-" * 80)
150 print("[INFO]: Resetting environment...")
151 # sample random actions
152 joint_efforts = torch.randn_like(env.action_manager.action)
153 # step the environment
154 obs, _ = env.step(joint_efforts)
155 # print current orientation of pole
156 print("[Env 0]: Pole joint: ", obs["policy"][0][1].item())
157 # update counter
158 count += 1
159
160 # close the environment
161 env.close()
162
163
164if __name__ == "__main__":
165 # run the main function
166 main()
167 # close sim app
168 simulation_app.close()
The Code Explained#
The base class envs.ManagerBasedEnv
wraps around many intricacies of the simulation interaction
and provides a simple interface for the user to run the simulation and interact with it. It
is composed of the following components:
scene.InteractiveScene
- The scene that is used for the simulation.managers.ActionManager
- The manager that handles actions.managers.ObservationManager
- The manager that handles observations.managers.EventManager
- The manager that schedules operations (such as domain randomization) at specified simulation events. For instance, at startup, on resets, or periodic intervals.
By configuring these components, the user can create different variations of the same environment
with minimal effort. In this tutorial, we will go through the different components of the
envs.ManagerBasedEnv
class and how to configure them to create a new environment.
Designing the scene#
The first step in creating a new environment is to configure its scene. For the cartpole environment, we will be using the scene from the previous tutorial. Thus, we omit the scene configuration here. For more details on how to configure a scene, see Using the Interactive Scene.
Defining actions#
In the previous tutorial, we directly input the action to the cartpole using
the assets.Articulation.set_joint_effort_target()
method. In this tutorial, we will
use the managers.ActionManager
to handle the actions.
The action manager can comprise of multiple managers.ActionTerm
. Each action term
is responsible for applying control over a specific aspect of the environment. For instance,
for robotic arm, we can have two action terms – one for controlling the joints of the arm,
and the other for controlling the gripper. This composition allows the user to define
different control schemes for different aspects of the environment.
In the cartpole environment, we want to control the force applied to the cart to balance the pole. Thus, we will create an action term that controls the force applied to the cart.
@configclass
class ActionsCfg:
"""Action specifications for the environment."""
joint_efforts = mdp.JointEffortActionCfg(asset_name="robot", joint_names=["slider_to_cart"], scale=5.0)
Defining observations#
While the scene defines the state of the environment, the observations define the states
that are observable by the agent. These observations are used by the agent to make decisions
on what actions to take. In Isaac Lab, the observations are computed by the
managers.ObservationManager
class.
Similar to the action manager, the observation manager can comprise of multiple observation terms. These are further grouped into observation groups which are used to define different observation spaces for the environment. For instance, for hierarchical control, we may want to define two observation groups – one for the low level controller and the other for the high level controller. It is assumed that all the observation terms in a group have the same dimensions.
For this tutorial, we will only define one observation group named "policy"
. While not completely
prescriptive, this group is a necessary requirement for various wrappers in Isaac Lab.
We define a group by inheriting from the managers.ObservationGroupCfg
class. This class
collects different observation terms and help define common properties for the group, such
as enabling noise corruption or concatenating the observations into a single tensor.
The individual terms are defined by inheriting from the managers.ObservationTermCfg
class.
This class takes in the managers.ObservationTermCfg.func
that specifies the function or
callable class that computes the observation for that term. It includes other parameters for
defining the noise model, clipping, scaling, etc. However, we leave these parameters to their
default values for this tutorial.
@configclass
class ObservationsCfg:
"""Observation specifications for the environment."""
@configclass
class PolicyCfg(ObsGroup):
"""Observations for policy group."""
# observation terms (order preserved)
joint_pos_rel = ObsTerm(func=mdp.joint_pos_rel)
joint_vel_rel = ObsTerm(func=mdp.joint_vel_rel)
def __post_init__(self) -> None:
self.enable_corruption = False
self.concatenate_terms = True
# observation groups
policy: PolicyCfg = PolicyCfg()
Defining events#
At this point, we have defined the scene, actions and observations for the cartpole environment. The general idea for all these components is to define the configuration classes and then pass them to the corresponding managers. The event manager is no different.
The managers.EventManager
class is responsible for events corresponding to changes
in the simulation state. This includes resetting (or randomizing) the scene, randomizing physical
properties (such as mass, friction, etc.), and varying visual properties (such as colors, textures, etc.).
Each of these are specified through the managers.EventTermCfg
class, which
takes in the managers.EventTermCfg.func
that specifies the function or callable
class that performs the event.
Additionally, it expects the mode of the event. The mode specifies when the event term should be applied.
It is possible to specify your own mode. For this, you’ll need to adapt the ManagerBasedEnv
class.
However, out of the box, Isaac Lab provides three commonly used modes:
"startup"
- Event that takes place only once at environment startup."reset"
- Event that occurs on environment termination and reset."interval"
- Event that are executed at a given interval, i.e., periodically after a certain number of steps.
For this example, we define events that randomize the pole’s mass on startup. This is done only once since this operation is expensive and we don’t want to do it on every reset. We also create an event to randomize the initial joint state of the cartpole and the pole at every reset.
@configclass
class EventCfg:
"""Configuration for events."""
# on startup
add_pole_mass = EventTerm(
func=mdp.randomize_rigid_body_mass,
mode="startup",
params={
"asset_cfg": SceneEntityCfg("robot", body_names=["pole"]),
"mass_distribution_params": (0.1, 0.5),
"operation": "add",
},
)
# on reset
reset_cart_position = EventTerm(
func=mdp.reset_joints_by_offset,
mode="reset",
params={
"asset_cfg": SceneEntityCfg("robot", joint_names=["slider_to_cart"]),
"position_range": (-1.0, 1.0),
"velocity_range": (-0.1, 0.1),
},
)
reset_pole_position = EventTerm(
func=mdp.reset_joints_by_offset,
mode="reset",
params={
"asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"]),
"position_range": (-0.125 * math.pi, 0.125 * math.pi),
"velocity_range": (-0.01 * math.pi, 0.01 * math.pi),
},
)
Tying it all together#
Having defined the scene and manager configurations, we can now define the environment configuration
through the envs.ManagerBasedEnvCfg
class. This class takes in the scene, action, observation and
event configurations.
In addition to these, it also takes in the envs.ManagerBasedEnvCfg.sim
which defines the simulation
parameters such as the timestep, gravity, etc. This is initialized to the default values, but can
be modified as needed. We recommend doing so by defining the __post_init__()
method in the
envs.ManagerBasedEnvCfg
class, which is called after the configuration is initialized.
@configclass
class CartpoleEnvCfg(ManagerBasedEnvCfg):
"""Configuration for the cartpole environment."""
# Scene settings
scene = CartpoleSceneCfg(num_envs=1024, env_spacing=2.5)
# Basic settings
observations = ObservationsCfg()
actions = ActionsCfg()
events = EventCfg()
def __post_init__(self):
"""Post initialization."""
# viewer settings
self.viewer.eye = [4.5, 0.0, 6.0]
self.viewer.lookat = [0.0, 0.0, 2.0]
# step settings
self.decimation = 4 # env step every 4 sim steps: 200Hz / 4 = 50Hz
# simulation settings
self.sim.dt = 0.005 # sim step every 5ms: 200Hz
Running the simulation#
Lastly, we revisit the simulation execution loop. This is now much simpler since we have
abstracted away most of the details into the environment configuration. We only need to
call the envs.ManagerBasedEnv.reset()
method to reset the environment and envs.ManagerBasedEnv.step()
method to step the environment. Both these functions return the observation and an info dictionary
which may contain additional information provided by the environment. These can be used by an
agent for decision-making.
The envs.ManagerBasedEnv
class does not have any notion of terminations since that concept is
specific for episodic tasks. Thus, the user is responsible for defining the termination condition
for the environment. In this tutorial, we reset the simulation at regular intervals.
def main():
"""Main function."""
# parse the arguments
env_cfg = CartpoleEnvCfg()
env_cfg.scene.num_envs = args_cli.num_envs
# setup base environment
env = ManagerBasedEnv(cfg=env_cfg)
# simulate physics
count = 0
while simulation_app.is_running():
with torch.inference_mode():
# reset
if count % 300 == 0:
count = 0
env.reset()
print("-" * 80)
print("[INFO]: Resetting environment...")
# sample random actions
joint_efforts = torch.randn_like(env.action_manager.action)
# step the environment
obs, _ = env.step(joint_efforts)
# print current orientation of pole
print("[Env 0]: Pole joint: ", obs["policy"][0][1].item())
# update counter
count += 1
# close the environment
env.close()
An important thing to note above is that the entire simulation loop is wrapped inside the
torch.inference_mode()
context manager. This is because the environment uses PyTorch
operations under-the-hood and we want to ensure that the simulation is not slowed down by
the overhead of PyTorch’s autograd engine and gradients are not computed for the simulation
operations.
The Code Execution#
To run the base environment made in this tutorial, you can use the following command:
./isaaclab.sh -p source/standalone/tutorials/03_envs/create_cartpole_base_env.py --num_envs 32
This should open a stage with a ground plane, light source, and cartpoles. The simulation should be
playing with random actions on the cartpole. Additionally, it opens a UI window on the bottom
right corner of the screen named "Isaac Lab"
. This window contains different UI elements that
can be used for debugging and visualization.
To stop the simulation, you can either close the window, or press Ctrl+C
in the terminal where you
started the simulation.
In this tutorial, we learned about the different managers that help define a base environment. We
include more examples of defining the base environment in the source/standalone/tutorials/03_envs
directory. For completeness, they can be run using the following commands:
# Floating cube environment with custom action term for PD control
./isaaclab.sh -p source/standalone/tutorials/03_envs/create_cube_base_env.py --num_envs 32
# Quadrupedal locomotion environment with a policy that interacts with the environment
./isaaclab.sh -p source/standalone/tutorials/03_envs/create_quadruped_base_env.py --num_envs 32
In the following tutorial, we will look at the envs.ManagerBasedRLEnv
class and how to use it
to create a Markovian Decision Process (MDP).