Closed-Loop Policy Inference and Evaluation#
This workflow demonstrates running the trained RSL-RL policy in closed-loop and evaluating it in the Lift Object environment.
Docker Container: Base (see Docker Containers for more details)
./docker/run_docker.sh
Once inside the container, set the models directory if you plan to download pre-trained checkpoints:
export MODELS_DIR=models/isaaclab_arena/reinforcement_learning
mkdir -p $MODELS_DIR
Note that this tutorial assumes that you’ve completed the preceding step (Policy Training) and have a trained checkpoint available, or you can download a pre-trained checkpoint as described below.
Download Pre-trained Model (skip preceding steps)
These commands can be used to download a pre-trained RSL-RL policy checkpoint, such that the preceding training step can be skipped.
hf download \
nvidia/IsaacLab-Arena-Lift-Object-RL \
model_11999.pt \
--local-dir $MODELS_DIR/lift_object_checkpoint
After downloading, you can use the checkpoint at:
$MODELS_DIR/lift_object_checkpoint/model_11999.pt
Replace checkpoint paths in the examples below with this path to evaluate the pre-trained model.
Evaluation Methods#
Isaac Lab Arena provides multiple ways to evaluate trained RL policies:
Quick Visualization (play.py): Fast visual inspection of policy behavior
Single Environment Evaluation (policy_runner.py): Detailed evaluation with metrics
Parallel Environment Evaluation (policy_runner.py): Large-scale statistical evaluation
Batch Evaluation (eval_runner.py): Automated evaluation of multiple checkpoints
Method 1: Quick Visualization#
The play.py script provides the fastest way to visually inspect your trained policy.
This is useful for debugging and quick quality checks.
python isaaclab_arena/scripts/reinforcement_learning/play.py \
--env_spacing 30.0 \
--num_envs 16 \
--checkpoint logs/rsl_rl/generic_experiment/2026-01-28_17-26-10/model_11999.pt \
lift_object
Key Features:
Fast startup with GUI enabled by default
Visualizes policy rollouts in real-time
No metrics computation (pure visualization)
Useful for debugging policy behavior
Command Arguments:
Argument |
Description |
|---|---|
|
Larger spacing for visualization (avoids visual clutter) |
|
Number of parallel environments to visualize |
|
Path to the trained model checkpoint (.pt file) |
|
Environment name (must be last) |
You should see multiple Franka robots simultaneously attempting to lift objects to various target positions.
Method 2: Single Environment Evaluation#
The policy_runner.py provides comprehensive evaluation with task-specific metrics.
python isaaclab_arena/evaluation/policy_runner.py \
--policy_type rsl_rl \
--num_steps 1000 \
--checkpoint_path logs/rsl_rl/generic_experiment/2026-01-28_17-26-10/model_11999.pt \
lift_object \
--rl_training_mode False
Note
If you downloaded the pre-trained model from Hugging Face, replace the checkpoint path:
--checkpoint_path $MODELS_DIR/lift_object_checkpoint/model_11999.pt
Important: Argument Order
Policy-specific arguments (--policy_type, --checkpoint_path, etc.) must come before the environment name.
Environment-specific arguments (--rl_training_mode, --object, etc.) must come after the environment name.
Command Breakdown:
Argument |
Description |
|---|---|
|
Policy type to load (RSL-RL trained policy) |
|
Total simulation steps to run |
|
Path to the model checkpoint |
|
Environment name |
|
Enable success termination for evaluation |
Expected Output:
At the end of evaluation, you should see metrics similar to:
Metrics: {'success_rate': 0.85, 'num_episodes': 12}
This indicates that 85% of episodes successfully lifted the object to the target position, across 12 completed episodes in 1000 steps.
Method 3: Parallel Environment Evaluation#
For more statistically significant results, evaluate across many parallel environments:
python isaaclab_arena/evaluation/policy_runner.py \
--policy_type rsl_rl \
--num_steps 5000 \
--num_envs 64 \
--checkpoint_path logs/rsl_rl/generic_experiment/2026-01-28_17-26-10/model_11999.pt \
--headless \
lift_object \
--rl_training_mode False
Additional Arguments:
Argument |
Description |
|---|---|
|
Run 64 parallel environments simultaneously |
|
Run without GUI for faster evaluation |
|
More steps for more episodes |
Expected Output:
Metrics: {'success_rate': 0.83, 'num_episodes': 156}
Running more environments and steps provides better statistical estimates of policy performance.
Method 4: Batch Evaluation with JSON Configuration#
For systematic evaluation of multiple checkpoints or hyperparameter sweeps, use eval_runner.py
with a JSON configuration file.
1. Create Evaluation Configuration
Create a file eval_config.json:
{
"policy_runner_args": {
"policy_type": "rsl_rl",
"num_steps": 5000,
"num_envs": 64,
"headless": true
},
"evaluations": [
{
"checkpoint_path": "logs/rsl_rl/generic_experiment/2026-01-28_17-26-10/model_5999.pt",
"environment": "lift_object",
"environment_args": {
"rl_training_mode": false
}
},
{
"checkpoint_path": "logs/rsl_rl/generic_experiment/2026-01-28_17-26-10/model_11999.pt",
"environment": "lift_object",
"environment_args": {
"rl_training_mode": false
}
}
]
}
2. Run Batch Evaluation
python isaaclab_arena/evaluation/eval_runner.py --config eval_config.json
This will automatically evaluate all checkpoints listed in the configuration and output a summary of metrics for each.
Expected Output:
Evaluating checkpoint 1/2: model_5999.pt
Metrics: {'success_rate': 0.72, 'num_episodes': 152}
Evaluating checkpoint 2/2: model_11999.pt
Metrics: {'success_rate': 0.85, 'num_episodes': 156}
Summary:
========================================
model_5999.pt | Success: 72% | Episodes: 152
model_11999.pt | Success: 85% | Episodes: 156
Understanding the Metrics#
The Lift Object task reports the following metrics:
Metric |
Description |
|---|---|
|
Fraction of episodes where object reached target position within tolerance |
|
Total number of episodes completed during evaluation |
A well-trained policy should achieve:
Success rate: 70-90% (depends on target range difficulty)
Consistent performance: Success rate stable across multiple evaluation runs
Troubleshooting#
Issue: Low success rate (<50%)
Increase training iterations (try 20,000+)
Check reward configuration in task definition
Verify command sampling ranges are reasonable
Try different random seeds
Issue: Policy gets stuck or drops object
Ensure object mass and friction are reasonable
Check gripper force limits
Visualize with
play.pyto diagnose behaviorReview episode recordings if
--videowas enabled during training
Issue: “Checkpoint not found” error
Verify checkpoint path is correct
Use absolute paths if relative paths fail
Check that training completed and saved checkpoints
Note
When running evaluation, always set --rl_training_mode False to enable success termination.
During training, this flag is True by default to prevent early episode termination.