Policy#
A policy in Arena is a standard interface between your model and the evaluation
pipeline. You implement one method — get_action(env, obs) — and the policy
plugs into both the single-job runner and the batch eval runner without any
changes to either. In bare IsaacLab you would write an ad-hoc inference loop
for each model; Arena’s PolicyBase gives a consistent contract that all
runners depend on.
policy = ZeroActionPolicy(config=ZeroActionPolicyArgs())
obs, _ = env.reset()
action = policy.get_action(env, obs)
Built-in policies#
Arena ships with four policies:
- ZeroActionPolicy (
"zero_action") Returns a zero-filled action tensor. Useful for validating an environment without a trained model.
- ReplayActionPolicy (
"replay") Replays actions from a recorded episode stored in an HDF5 file.
- RslRlActionPolicy (
"rsl_rl") Runs inference with a trained RSL-RL checkpoint. Loads the checkpoint and its accompanying
params/agent.yamlautomatically.
Writing a custom policy#
Subclass PolicyBase, set a name, decorate with @register_policy,
and implement get_action:
import gymnasium as gym
import torch
from gymnasium.spaces.dict import Dict as GymSpacesDict
from isaaclab_arena.assets.register import register_policy
from isaaclab_arena.policy.policy_base import PolicyBase
@register_policy
class MyPolicy(PolicyBase):
name = "my_policy"
def __init__(self, config):
super().__init__(config)
def get_action(self, env: gym.Env, observation: GymSpacesDict) -> torch.Tensor:
# Your model inference here
return torch.zeros(env.action_space.shape, device=torch.device(env.unwrapped.device))
@staticmethod
def add_args_to_parser(parser):
# Add any CLI arguments your policy needs, then return the parser
return parser
@staticmethod
def from_args(args):
return MyPolicy(config=None)
Once registered, select the policy by name on the command line:
python isaaclab_arena/evaluation/policy_runner.py \
--policy_type my_policy \
...
For policies not registered by name, pass a dotted Python path instead
(e.g. --policy_type mypackage.mypolicy.MyPolicy). The runner will
import and instantiate the class directly.
To use a custom policy in the batch eval runner’s JSON config, define a
config_class dataclass on the policy and implement from_dict().
This lets the runner instantiate the policy from a plain dict without
going through argparse. See Evaluation Types for details.