Policy Design#

Policies define how agents generate actions from observations. The policy system provides a registry system for managing different policies.

Core Architecture#

Policies use the PolicyBase abstract class:

class PolicyBase(ABC):
    @abstractmethod
    def get_action(self, env: gym.Env, observation: GymSpacesDict) -> torch.Tensor:
        """Compute an action given the environment and observation.

        Args:
            env: The environment instance
            observation: Observation dictionary from the environment

        Returns:
            torch.Tensor: The action to take
        """

This enables seamless swapping between different policy implementations while maintaining consistent integration with IsaacLab Arena environments.

Policies in Detail#

Policy Categories

Three main types addressing different use cases:

  • Zero Action Policy: Baseline that returns zero actions for environment testing and physics validation

  • Replay Action Policy: Replays pre-recorded demonstrations from HDF5 datasets for analysis and evaluation

  • GR00T Neural Policies: Advanced foundation models for visuomotor control with multi-modal inputs

Implementation Patterns

Common policy implementation approaches:

  • Stateless Policies: Pure functions from observations to actions (ZeroActionPolicy)

  • Dataset-Driven: Load and replay recorded trajectories (ReplayActionPolicy)

  • Neural Networks: Process visual and proprioceptive inputs for learned behaviors (GR00T policies)

Environment Integration#

# Policy creation from CLI arguments
arena_builder = get_arena_builder_from_cli(args)
env = arena_builder.make_registered()
policy, num_steps = create_policy(args)

# Standard execution loop
obs, _ = env.reset()
for step in range(num_steps):
    with torch.inference_mode():
        actions = policy.get_action(env, obs)
        obs, rewards, terminated, truncated, info = env.step(actions)
        if terminated.any() or truncated.any():
            obs, _ = env.reset()

Usage Examples#

Baseline Testing

# Zero action policy for environment validation
python policy_runner.py --policy_type zero_action kitchen_pick_and_place --num_steps 1000

Demonstration Replay

# Replay recorded demonstrations
python policy_runner.py --policy_type replay --replay_file_path demos.h5 kitchen_pick_and_place

Neural Policy Execution

# GR00T foundation model deployment
python policy_runner.py --policy_type gr00t_closedloop --policy_config_yaml_path config.yaml

Custom Policy Integration

class CustomPolicy(PolicyBase):
    def get_action(self, env, observation):
        # Custom control logic
        return torch.zeros(env.action_space.shape)

policy = CustomPolicy()
actions = policy.get_action(environment, observations)