Policy Inference in USD Environment#
Having learnt how to modify a task in Modifying an existing Direct RL Environment, we will now look at how to run a trained policy in a prebuilt USD scene.
In this tutorial, we will use the RSL RL library and the trained policy from the Humanoid Rough Terrain Isaac-Velocity-Rough-H1-v0
task in a simple warehouse USD.
The Tutorial Code#
For this tutorial, we use the trained policy’s checkpoint exported as jit (which is an offline version of the policy).
The H1RoughEnvCfg_PLAY
cfg encapsulates the configuration values of the inference environment, including the assets to
be instantiated.
In order to use a prebuilt USD environment instead of the terrain generator specified, we make the
following changes to the config before passing it to the ManagerBasedRLEnv
.
Code for policy_inference_in_usd.py
1# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
2# All rights reserved.
3#
4# SPDX-License-Identifier: BSD-3-Clause
5
6"""
7This script demonstrates policy inference in a prebuilt USD environment.
8
9In this example, we use a locomotion policy to control the H1 robot. The robot was trained
10using Isaac-Velocity-Rough-H1-v0. The robot is commanded to move forward at a constant velocity.
11
12.. code-block:: bash
13
14 # Run the script
15 ./isaaclab.sh -p scripts/tutorials/03_envs/policy_inference_in_usd.py --checkpoint /path/to/jit/checkpoint.pt
16
17"""
18
19"""Launch Isaac Sim Simulator first."""
20
21
22import argparse
23
24from isaaclab.app import AppLauncher
25
26# add argparse arguments
27parser = argparse.ArgumentParser(description="Tutorial on inferencing a policy on an H1 robot in a warehouse.")
28parser.add_argument("--checkpoint", type=str, help="Path to model checkpoint exported as jit.", required=True)
29
30# append AppLauncher cli args
31AppLauncher.add_app_launcher_args(parser)
32# parse the arguments
33args_cli = parser.parse_args()
34
35# launch omniverse app
36app_launcher = AppLauncher(args_cli)
37simulation_app = app_launcher.app
38
39"""Rest everything follows."""
40import io
41import os
42import torch
43
44import omni
45
46from isaaclab.envs import ManagerBasedRLEnv
47from isaaclab.terrains import TerrainImporterCfg
48from isaaclab.utils.assets import ISAAC_NUCLEUS_DIR
49
50from isaaclab_tasks.manager_based.locomotion.velocity.config.h1.rough_env_cfg import H1RoughEnvCfg_PLAY
51
52
53def main():
54 """Main function."""
55 # load the trained jit policy
56 policy_path = os.path.abspath(args_cli.checkpoint)
57 file_content = omni.client.read_file(policy_path)[2]
58 file = io.BytesIO(memoryview(file_content).tobytes())
59 policy = torch.jit.load(file)
60 env_cfg = H1RoughEnvCfg_PLAY()
61 env_cfg.scene.num_envs = 1
62 env_cfg.curriculum = None
63 env_cfg.scene.terrain = TerrainImporterCfg(
64 prim_path="/World/ground",
65 terrain_type="usd",
66 usd_path=f"{ISAAC_NUCLEUS_DIR}/Environments/Simple_Warehouse/warehouse.usd",
67 )
68 env_cfg.sim.device = "cpu"
69 env_cfg.sim.use_fabric = False
70 env = ManagerBasedRLEnv(cfg=env_cfg)
71 obs, _ = env.reset()
72 while simulation_app.is_running():
73 action = policy(obs["policy"]) # run inference
74 obs, _, _, _, _ = env.step(action)
75
76
77if __name__ == "__main__":
78 main()
79 simulation_app.close()
Note that we have set the device to CPU
and disabled the use of Fabric for inferencing.
This is because when simulating a small number of environment, CPU simulation can often perform faster than GPU simulation.
The Code Execution#
First, we need to train the Isaac-Velocity-Rough-H1-v0
task by running the following:
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Velocity-Rough-H1-v0 --headless
When the training is finished, we can visualize the result with the following command.
To stop the simulation, you can either close the window, or press Ctrl+C
in the terminal
where you started the simulation.
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py --task Isaac-Velocity-Rough-H1-v0 --num_envs 64 --checkpoint logs/rsl_rl/h1_rough/EXPERIMENT_NAME/POLICY_FILE.pt
After running the play script, the policy will be exported to jit and onnx files under the experiment logs directory.
Note that not all learning libraries support exporting the policy to a jit or onnx file.
For libraries that don’t currently support this functionality, please refer to the corresponding play.py
script for the library
to learn about how to initialize the policy.
We can then load the warehouse asset and run inference on the H1 robot using the exported jit policy.
./isaaclab.sh -p scripts/tutorials/03_envs/policy_inference_in_usd.py --checkpoint logs/rsl_rl/h1_rough/EXPERIMENT_NAME/exported/policy.pt
In this tutorial, we learnt how to make minor modifications to an existing environment config to run policy inference in a prebuilt usd environment.