Evaluation Types#
Isaac Lab Arena supports two main ways to run policy evaluation: a single-job policy runner (single or multi-GPU) and a sequential batch eval runner for multiple jobs in one process. This section summarizes when to use each and how they work. Each section below links to the relevant concept docs: Policy Design, Environment Design, and Metrics Design.
Both runners support a server–client setup, where simulation runs locally
(client) and policy inference runs in a separate process or machine (server).
This is the deployment used by Gr00tRemoteClosedloopPolicy: the simulation
client ships observations to a GR00T
policy server over the network, receives action chunks, and applies them in the
sim. The split lets a heavyweight model (e.g. GR00T N1.6) live on a dedicated
GPU while the simulation client runs on its own GPU, and is orthogonal to the
runner choice — pass the remote-policy class as --policy_type and add
--remote_host / --remote_port flags. End-to-end commands (including
how to launch the GR00T server out of the
submodules/Isaac-GR00T submodule) live in
Running a Real Policy and the
example workflows.
Summary#
Type |
Use case |
Entry point |
Multi-GPU |
|---|---|---|---|
Policy runner |
Single job, one env config, one policy |
|
Yes (torchrun) |
Sequential batch eval runner |
Multiple jobs (env/policy combos) in sequence |
|
No |
1. Policy runner — single job (single GPU and multi-GPU)#
The policy runner (isaaclab_arena/evaluation/policy_runner.py) runs one
evaluation job: one environment configuration and one policy. It is the right
choice for ad-hoc runs, debugging, or when you want to drive one scenario with
full control over CLI arguments.
Design context: For how policies are defined and integrated with environments, see Policy Design.
Features:
Single environment configuration (scene, embodiment, task) and one policy.
Heterogeneous objects: When the environment supports it, you can pass
--object_setwith a space-separated list of object names. Each parallel environment is assigned a different object from the set (e.g. env 0 gets the first object, env 1 the second, etc.). This allows evaluating one policy across multiple object types in a single run without changing the scene or task logic.Run length by steps (
--num_steps) or episodes (--num_episodes); policies that define a length (e.g.policy.has_length()) can override this.Single GPU: one process, one Isaac Sim instance.
Multi-GPU: use
torchrunwith--distributed; one process per GPU, each with its own Isaac Sim instance and device (e.g.cuda:0,cuda:1).Metrics are computed at the end if the environment registers metrics and are logged to the console.
Single-GPU example
python isaaclab_arena/evaluation/policy_runner.py \
--viz kit \
--policy_type <policy_type> \
--num_steps 2000 \
--num_envs 10 \
<arena_environment> \
--embodiment <embodiment> \
--object <object>
...
Heterogeneous objects example (single or multi-GPU)
Use --object_set so each of the --num_envs parallel environments gets a
different object from the list. Object-to-environment mapping: with the default
deterministic assignment, environment \(i\) gets the object at index
\(i \\mod n\) in the list (where \(n\) = len(object_set))—so when
num_envs > len(object_set) the assignment cycles (no truncation or
error). If the object set is created with random_choice=True, each environment
gets a randomly chosen object from the set. Some environments may require
num_envs == len(object_set).
python isaaclab_arena/evaluation/policy_runner.py \
--viz kit \
--policy_type <policy_type> \
--num_steps 2000 \
--num_envs 4 \
--enable_cameras \
put_item_in_fridge_and_close_door \
--embodiment gr1_joint \
--object_set ketchup_bottle_hope_robolab ranch_dressing_hope_robolab bbq_sauce_bottle_hope_robolab mayonnaise_bottle_hope_robolab
Multi-GPU example
Use torch.distributed.run (or torchrun) with --nproc_per_node=<num_gpus>
and pass --distributed so each process uses a different GPU (via LOCAL_RANK):
python -m torch.distributed.run --nnode=1 --nproc_per_node=<num_gpus> \
isaaclab_arena/evaluation/policy_runner.py \
--policy_type <policy_type> \
--num_steps 2000 \
--num_envs 10 \
--distributed \
--headless \
<arena_environment> \
...
Policy runner CLI (relevant flags)
--policy_type: Registered policy name or dotted path to policy class (e.g.module.submodule.ClassName).--num_steps: Total simulation steps (mutually exclusive with--num_episodes).--num_episodes: Total episodes (mutually exclusive with--num_steps).--distributed: Enable distributed mode; use withtorchrunand set device per rank (e.g.cuda:{local_rank}).
The rest of the arguments (environment, embodiment, object, etc.) come from the
Arena environments CLI and the policy’s own add_args_to_parser.
2. Sequential batch eval runner — batch jobs#
The sequential batch eval runner (isaaclab_arena/evaluation/eval_runner.py)
runs a batch of evaluation jobs sequentially in a single process. Each job can have
a different environment (scene/object/embodiment), policy type, policy config,
and length (steps or episodes). This is suited for benchmarking many
configurations (e.g. many objects or tasks) without launching multiple processes
by hand. Persistence of the simulation application is maintained between jobs.
Design context: For how environments are composed and how metrics are defined and computed, see Environment Design and Metrics Design. Policies used per job follow Policy Design.
Features:
One JSON config file (
--eval_jobs_config) listing all jobs.Jobs run one after another; each job builds its environment, creates the policy from the job config, runs
rollout_policy, then tears down the env before the next job.If a job fails, the runner continues with the next job and marks the failed job accordingly.
Metrics are aggregated and printed at the end (e.g. via
MetricsLogger).Distributed evaluation is not supported: the sequential batch eval runner runs in a single process. For multi-GPU, use multiple policy runner invocations (e.g. with
torchrun) or split the batch across machines.
Todo
Experiment with distributed evaluation in the sequential batch eval runner.
Jobs config format
The config file must be a JSON object with a "jobs" array. Each job is an
object with:
name: Unique job name (for logging and metrics).arena_env_args: Environment arguments as a dict (e.g.environment,num_envs,object,embodiment,enable_cameras, etc.). Converted internally to the same CLI-style list the policy runner uses.policy_type: Same as policy runner (registered name or dotted class path).policy_config_dict: Policy configuration (e.g. checkpoint path, model options). Used withPolicyBase.from_dictif the policy has aconfig_class, otherwise converted to CLI args andfrom_args.num_stepsornum_episodes(optional): Simulation length for this job. If both are omitted, the runner uses the policy’s length if defined, or a CLI default (e.g.--num_steps).
Example config structure
{
"jobs": [
{
"name": "gr1_open_microwave_cracker_box",
"arena_env_args": {
"environment": "gr1_open_microwave",
"object": "cracker_box",
"embodiment": "gr1_joint",
"num_envs": 4
},
"num_steps": 500,
"policy_type": "zero_action",
"policy_config_dict": {}
},
{
"name": "gr1_sequential_static_manipulation_put_ranch_dressing_bottle_in_fridge_and_close_door",
"arena_env_args": {
"enable_cameras": true,
"environment": "gr1_sequential_static_manipulation",
"object": "ranch_dressing_hope_robolab",
"embodiment": "gr1_joint"
},
"num_steps": 100,
"policy_type": "isaaclab_arena_gr00t.policy.gr00t_remote_closedloop_policy.Gr00tRemoteClosedloopPolicy",
"policy_config_dict": {
"policy_config_yaml_path": "isaaclab_arena_gr00t/policy/config/gr1_manip_ranch_bottle_gr00t_closedloop_config.yaml",
"policy_device": "cuda:0",
"remote_host": "127.0.0.1",
"remote_port": 5555
}
}
]
}
Running the sequential batch eval runner
python isaaclab_arena/evaluation/eval_runner.py \
--viz kit \
--eval_jobs_config path/to/eval_jobs_config.json \
--num_steps 1000
If any job needs cameras, set enable_cameras: true in that job’s
arena_env_args; the sequential batch eval runner automatically enables camera support if any job requires it.
Choosing an evaluation type#
One-off run, one setup: use the policy runner (single or multi-GPU); use
--object_setfor heterogeneous objects in one run.Many env/policy combinations in one go: use the sequential batch eval runner with a jobs JSON; use
--object_setfor heterogeneous objects in one run.