Teleoperation and Imitation Learning#
Teleoperation#
We provide interfaces for providing commands in SE(2) and SE(3) space for robot control. In case of SE(2) teleoperation, the returned command is the linear x-y velocity and yaw rate, while in SE(3), the returned command is a 6-D vector representing the change in pose.
To play inverse kinematics (IK) control with a keyboard device:
./isaaclab.sh -p scripts/environments/teleoperation/teleop_se3_agent.py --task Isaac-Lift-Cube-Franka-IK-Rel-v0 --num_envs 1 --teleop_device keyboard
For smoother operation and off-axis operation, we recommend using a SpaceMouse as the input device. Providing smoother demonstrations will make it easier for the policy to clone the behavior. To use a SpaceMouse, simply change the teleop device accordingly:
./isaaclab.sh -p scripts/environments/teleoperation/teleop_se3_agent.py --task Isaac-Lift-Cube-Franka-IK-Rel-v0 --num_envs 1 --teleop_device spacemouse
Note
If the SpaceMouse is not detected, you may need to grant additional user permissions by running sudo chmod 666 /dev/hidraw<#>
where <#>
corresponds to the device index
of the connected SpaceMouse.
To determine the device index, list all hidraw
devices by running ls -l /dev/hidraw*
.
Identify the device corresponding to the SpaceMouse by running cat /sys/class/hidraw/hidraw<#>/device/uevent
on each of the devices listed
from the prior step.
Only compatible with the SpaceMouse Wireless and SpaceMouse Compact models from 3Dconnexion.
The script prints the teleoperation events configured. For keyboard, these are as follows:
Keyboard Controller for SE(3): Se3Keyboard
Reset all commands: R
Toggle gripper (open/close): K
Move arm along x-axis: W/S
Move arm along y-axis: A/D
Move arm along z-axis: Q/E
Rotate arm along x-axis: Z/X
Rotate arm along y-axis: T/G
Rotate arm along z-axis: C/V
For SpaceMouse, these are as follows:
SpaceMouse Controller for SE(3): Se3SpaceMouse
Reset all commands: Right click
Toggle gripper (open/close): Click the left button on the SpaceMouse
Move arm along x/y-axis: Tilt the SpaceMouse
Move arm along z-axis: Push or pull the SpaceMouse
Rotate arm: Twist the SpaceMouse
The next section describes how teleoperation devices can be used for data collection for imitation learning.
Imitation Learning#
Using the teleoperation devices, it is also possible to collect data for learning from demonstrations (LfD). For this, we provide scripts to collect data into the open HDF5 format.
Collecting demonstrations#
To collect demonstrations with teleoperation for the environment Isaac-Stack-Cube-Franka-IK-Rel-v0
, use the following commands:
# step a: create folder for datasets
mkdir -p datasets
# step b: collect data with a selected teleoperation device. Replace <teleop_device> with your preferred input device.
# Available options: spacemouse, keyboard
./isaaclab.sh -p scripts/tools/record_demos.py --task Isaac-Stack-Cube-Franka-IK-Rel-v0 --teleop_device <teleop_device> --dataset_file ./datasets/dataset.hdf5 --num_demos 10
# step a: replay the collected dataset
./isaaclab.sh -p scripts/tools/replay_demos.py --task Isaac-Stack-Cube-Franka-IK-Rel-v0 --dataset_file ./datasets/dataset.hdf5
Note
The order of the stacked cubes should be blue (bottom), red (middle), green (top).
About 10 successful demonstrations are required in order for the following steps to succeed.
Here are some tips to perform demonstrations that lead to successful policy training:
Keep demonstrations short. Shorter demonstrations mean fewer decisions for the policy, making training easier.
Take a direct path. Do not follow along arbitrary axis, but move straight toward the goal.
Do not pause. Perform smooth, continuous motions instead. It is not obvious for a policy why and when to pause, hence continuous motions are easier to learn.
If, while performing a demonstration, a mistake is made, or the current demonstration should not be recorded for some other reason, press the R
key to discard the current demonstration, and reset to a new starting position.
Note
Non-determinism may be observed during replay as physics in IsaacLab are not determimnistically reproducible when using env.reset
.
Pre-recorded demonstrations#
We provide a pre-recorded dataset.hdf5
containing 10 human demonstrations for Isaac-Stack-Cube-Franka-IK-Rel-v0
here.
This dataset may be downloaded and used in the remaining tutorial steps if you do not wish to collect your own demonstrations.
Note
Use of the pre-recorded dataset is optional.
Generating additional demonstrations#
Additional demonstrations can be generated using Isaac Lab Mimic.
Isaac Lab Mimic is a feature in Isaac Lab that allows generation of additional demonstrations automatically, allowing a policy to learn successfully even from just a handful of manual demonstrations.
In order to use Isaac Lab Mimic with the recorded dataset, first annotate the subtasks in the recording:
./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/annotate_demos.py --input_file ./datasets/dataset.hdf5 --output_file ./datasets/annotated_dataset.hdf5 --task Isaac-Stack-Cube-Franka-IK-Rel-Mimic-v0 --auto
Then, use Isaac Lab Mimic to generate some additional demonstrations:
./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/generate_dataset.py --input_file ./datasets/annotated_dataset.hdf5 --output_file ./datasets/generated_dataset_small.hdf5 --num_envs 10 --generation_num_trials 10
Note
The output_file of the annotate_demos.py
script is the input_file to the generate_dataset.py
script
Note
Isaac Lab is designed to work with manipulators with grippers. The gripper commands in the demonstrations are extracted separately and temporally replayed during the generation of additional demonstrations.
Inspect the output of generated data (filename: generated_dataset_small.hdf5
), and if satisfactory, generate the full dataset:
./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/generate_dataset.py --input_file ./datasets/annotated_dataset.hdf5 --output_file ./datasets/generated_dataset.hdf5 --num_envs 10 --generation_num_trials 1000 --headless
The number of demonstrations can be increased or decreased, 1000 demonstrations have been shown to provide good training results for this task.
Additionally, the number of environments in the --num_envs
parameter can be adjusted to speed up data generation. The suggested number of 10 can be executed even on a laptop GPU. On a more powerful desktop machine, set it to 100 or higher for significant speedup of this step.
Robomimic setup#
As an example, we will train a BC agent implemented in Robomimic to train a policy. Any other framework or training method could be used.
To install the robomimic framework, use the following commands:
# install the dependencies
sudo apt install cmake build-essential
# install python module (for robomimic)
./isaaclab.sh -i robomimic
Training an agent#
We can now train a BC agent for Isaac-Stack-Cube-Franka-IK-Rel-v0
using the Mimic generated data:
./isaaclab.sh -p scripts/imitation_learning/robomimic/train.py --task Isaac-Stack-Cube-Franka-IK-Rel-v0 --algo bc --dataset ./datasets/generated_dataset.hdf5
By default, the training script will save a model checkpoint every 100 epochs. The trained models and logs will be saved to logs/robomimic/Isaac-Stack-Cube-Franka-IK-Rel-v0/bc
Visualizing results#
By inferencing using the generated model, we can visualize the results of the policy in the same environment:
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py --task Isaac-Stack-Cube-Franka-IK-Rel-v0 --num_rollouts 50 --checkpoint /PATH/TO/desired_model_checkpoint.pth
Common Pitfalls when Generating Data#
Demonstrations are too long:
Longer time horizon is harder to learn for a policy
Start close to the first object and minimize motions
Demonstrations are not smooth:
Irregular motion is hard for policy to decipher
Better teleop devices result in better data (i.e. SpaceMouse is better than Keyboard)
Pauses in demonstrations:
Pauses are difficult to learn
Keep the human motions smooth and fluid
Excessive number of subtasks:
Minimize the number of defined subtasks for completing a given task
Less subtacks results in less stitching of trajectories, yielding higher data generation success rate
Lack of action noise:
Action noise makes policies more robust
Recording cropped too tight:
If recording stops on the frame the success term triggers, it may not re-trigger during replay
Allow for some buffer at the end of recording
Non-deterministic replay:
Physics in IsaacLab are not deterministically reproducible when using
env.reset
so demonstrations may fail on replayCollect more human demos than needed, use the ones that succeed during annotation
All data in Isaac Lab Mimic generated HDF5 file represent a successful demo and can be used for training (even if non-determinism causes failure when replayed)
Creating Your Own Isaac Lab Mimic Compatible Environments#
How it works#
Isaac Lab Mimic works by splitting the input demonstrations into subtasks. Subtasks are user-defined segments in the demonstrations that are common to all demonstrations. Examples for subtasks are “grasp an object”, “move end effector to some pre-defined position”, “release object” etc.. Note that most subtasks are defined with respect to some object that the robot interacts with.
Subtasks need to be defined, and then annotated for each input demonstration. Annotation can either happen algorithmically by defining heuristics for subtask detection, as was done in the example above, or it can be done manually.
With subtasks defined and annotated, Isaac Lab Mimic utilizes a small number of helper methods to then transform the subtask segments, and generate new demonstrations by stitching them together to match the new task at hand.
For each thusly generated candidate demonstration, Isaac Lab Mimic uses a boolean success criteria to determine whether the demonstration succeeded in performing the task, and if so, add it to the output dataset. Success rate of candidate demonstrations can be as high as 70% in simple cases, and as low as <1%, depending on the difficulty of the task, and the complexity of the robot itself.
Configuration and subtask definition#
Subtasks, among other configuration settings for Isaac Lab Mimic, are defined in a Mimic compatible environment configuration class that is created by extending the existing environment config with additional Mimic required parameters.
All Mimic required config parameters are specified in the MimicEnvCfg
class.
The config class FrankaCubeStackIKRelMimicEnvCfg
serves as an example of creating a Mimic compatible environment config class for the Franka stacking task that was used in the examples above.
The DataGenConfig
member contains various parameters that influence how data is generated. It is initially sufficient to just set the name
parameter, and revise the rest later.
Subtasks are a list of SubTaskConfig
objects, of which the most important members are:
object_ref
is the object that is being interacted with. This will be used to adjust motions relative to this object during data generation. Can beNone
if the current subtask does not involve any object.subtask_term_signal
is the ID of the signal indicating whether the subtask is active or not.
Subtask annotation#
Once the subtasks are defined, they need to be annotated in the source data. There are two methods to annotate source demonstrations for subtask boundaries: Manual annotation or using heuristics.
It is often easiest to perform manual annotations, since the number of input demonstrations is usually very small. To perform manual annotations, use the annotate_demos.py
script without the --auto
flag. Then press B
to pause, N
to continue, and S
to annotate a subtask boundary.
For more accurate boundaries, or to speed up repeated processing of a given task for experiments, heuristics can be implemented to perform the same task. Heuristics are observations in the environment. An example how to add subtask terms can be found in source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/stack/stack_env_cfg.py
, where they are added as an observation group called SubtaskCfg
. This example is using prebuilt heuristics, but custom heuristics are easily implemented.
Helpers for demonstration generation#
Helpers needed for Isaac Lab Mimic are defined in the environment. All tasks that are to be used with Isaac Lab Mimic are derived from the ManagerBasedRLMimicEnv
base class, and must implement the following functions:
get_robot_eef_pose
: Returns the current robot end effector pose in the same frame as used by the robot end effector controller.target_eef_pose_to_action
: Takes a target pose and a gripper action for the end effector controller and returns an action which achieves the target pose.action_to_target_eef_pose
: Takes an action and returns a target pose for the end effector controller.actions_to_gripper_actions
: Takes a sequence of actions and returns the gripper actuation part of the actions.get_object_poses
: Returns the pose of each object in the scene that is used for data generation.get_subtask_term_signals
: Returns a dictionary of binary flags for each subtask in a task. The flag of true is set when the subtask has been completed and false otherwise.
The class FrankaCubeStackIKRelMimicEnv
shows an example of creating a Mimic compatible environment from an existing Isaac Lab environment.
Registering the environment#
Once both Mimic compatible environment and environment config classes have been created, a new Mimic compatible environment can be registered using gym.register
. For the Franka stacking task in the examples above, the Mimic environment is registered as Isaac-Stack-Cube-Franka-IK-Rel-Mimic-v0
.
The registered environment is now ready to be used with Isaac Lab Mimic.