Policy Training#
Docker Container: Base (see Docker Containers for more details)
./docker/run_docker.sh
Training Command#
Training uses IsaacLab’s RSL-RL training script directly. The --external_callback argument
points to an Arena function that runs before training starts — it reads the --task argument,
builds the environment, and registers it with gym so IsaacLab’s script can find it by name.
python submodules/IsaacLab/scripts/reinforcement_learning/rsl_rl/train.py \
--external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
--task lift_object \
--num_envs 512 \
--max_iterations 12000
Tip
Add --headless to suppress the GUI when running on a headless server.
Checkpoints are written to logs/rsl_rl/generic_experiment/<timestamp>/.
The agent configuration is saved alongside as params/agent.yaml,
which the evaluation script uses to reconstruct the policy at inference time.
Overriding Hyperparameters#
Hyperparameters come from RLPolicyCfg in isaaclab_arena_examples/policy/base_rsl_rl_policy.py
and can be overridden with Hydra syntax appended to the training command:
# Change network activation function to relu (default: elu)
agent.policy.activation=relu
# Adjust the learning rate (default: 0.0001)
agent.algorithm.learning_rate=0.001
# Save a checkpoint more frequently (default: every 200 iterations)
agent.save_interval=500
For example, to train with relu activation and a higher learning rate:
python submodules/IsaacLab/scripts/reinforcement_learning/rsl_rl/train.py \
--external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
--task lift_object \
--num_envs 512 \
--max_iterations 12000 \
agent.policy.activation=relu \
agent.algorithm.learning_rate=0.001
Monitoring Training#
Launch Tensorboard to monitor progress:
python -m tensorboard.main --logdir logs/rsl_rl
During training, each iteration prints a summary to the console:
Learning iteration 2000/12000
Computation: 308 steps/s (collection: 4.600s, learning 0.377s)
Mean action noise std: 1.00
Mean value_function loss: 0.0273
Mean surrogate loss: -0.0138
Mean entropy loss: 9.9339
Mean reward: 0.65
Mean episode length: 12.00
Episode_Reward/action_rate: -0.0000
Episode_Reward/joint_vel: -0.0001
Episode_Reward/reaching_object: 0.0000
Episode_Reward/lifting_object: 0.1050
Episode_Reward/object_goal_tracking: 0.0223
Episode_Reward/object_goal_tracking_fine_grained: 0.0000
Metrics/object_pose/position_error: 0.5721
Metrics/object_pose/orientation_error: 2.2834
Episode_Termination/time_out: 0.0423
Episode_Termination/object_dropped: 0.0000
Episode_Termination/success: 0.0000
================================================================================
Total timesteps: 1536
Iteration time: 4.98s
Time elapsed: 00:00:04
ETA: 00:00:49
Multi-GPU Training#
Add --distributed to spread environments across all available GPUs:
python submodules/IsaacLab/scripts/reinforcement_learning/rsl_rl/train.py \
--external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
--task lift_object \
--num_envs 512 \
--max_iterations 12000 \
--headless \
--distributed
Expected Results#
After 12,000 iterations (~6 hours on a single GPU with 512 environments), the trained policy should reliably grasp and lift objects to commanded target positions.
Note
Training performance depends on hardware, environment configuration, and random seed. For best results, use a powerful GPU (e.g., RTX 4090, A100, L40).