RL Tasks#

RL tasks extend their imitation learning counterparts with the components needed for reinforcement learning training: a command manager that samples a new goal each episode, reward terms, and goal-conditioned observations.

The pattern is straightforward — an RL task subclasses the corresponding IL task, and implements the RL-specific parts:

class LiftObjectTaskRL(LiftObjectTask):

    def __init__(self):
       super().__init__()

    def get_rewards_cfg(self):
       pass

    def get_commands_cfg(self):
       pass

This means the RL task inherits everything from the IL task (scene config, termination conditions, metrics) and adds the RL-specific parts on top. Note the the RL task is also likely to modify the IL task’s configuration. For example, adding privileged information to the observations.

Usage#

lift_object = asset_registry.get_asset_by_name("cracker_box")()

task = LiftObjectTaskRL(
    lift_object=lift_object,
    background_scene=table,
    embodiment=embodiment,
)

LiftObjectTaskRL adds a command manager that samples a random target position each episode, reward terms for reaching the object, lifting it, and tracking the goal, and goal-conditioned observations that tell the policy where the target is.

See Franka Lift Object Task for a complete example for how to use an RL task.

Training vs. evaluation mode#

RL tasks have an rl_training_mode flag (default True). During training, success does not terminate the episode — the robot keeps acting until the time limit. This is standard practice to avoid sparse termination signals early in training. For evaluation, set rl_training_mode=False so episodes end on success:

# Training
task = LiftObjectTaskRL(lift_object, table, embodiment, rl_training_mode=True)

# Evaluation
task = LiftObjectTaskRL(lift_object, table, embodiment, rl_training_mode=False)