Sub-module for data collection utilities.

All post-processed robomimic compatible datasets share the same data structure. A single dataset is a single HDF5 file. The stored data follows the structure provided here.

The collector takes input data in its batched format and stores them as different demonstrations, each corresponding to a given environment index. The demonstrations are flushed to disk when the RobomimicDataCollector.flush() is called for the respective environments. All the data is saved when the RobomimicDataCollector.close() is called.

The following sample shows how to use the RobomimicDataCollector to store random data in a dataset.

import os
import torch

from omni.isaac.lab_tasks.utils.data_collector import RobomimicDataCollector

# name of the environment (needed by robomimic)
task_name = "Isaac-Franka-Lift-v0"
# specify directory for logging experiments
test_dir = os.path.dirname(os.path.abspath(__file__))
log_dir = os.path.join(test_dir, "logs", "demos")
# name of the file to save data
filename = "hdf_dataset.hdf5"
# number of episodes to collect
num_demos = 10
# number of environments to simulate
num_envs = 4

# create data-collector
collector_interface = RobomimicDataCollector(task_name, log_dir, filename, num_demos)

# reset the collector

while not collector_interface.is_stopped():
   # generate random data to store
   # -- obs
   obs = {
         "joint_pos": torch.randn(num_envs, 10),
         "joint_vel": torch.randn(num_envs, 10)
   # -- actions
   actions = torch.randn(num_envs, 10)
   # -- rewards
   rewards = torch.randn(num_envs)
   # -- dones
   dones = torch.rand(num_envs) > 0.5

   # store signals
   # -- obs
   for key, value in obs.items():
         collector_interface.add(f"obs/{key}", value)
   # -- actions
   collector_interface.add("actions", actions)
   # -- next_obs
   for key, value in obs.items():
         collector_interface.add(f"next_obs/{key}", value.cpu().numpy())
   # -- rewards
   collector_interface.add("rewards", rewards)
   # -- dones
   collector_interface.add("dones", dones)

   # flush data from collector for successful environments
   # note: in this case we flush all the time
   reset_env_ids = dones.nonzero(as_tuple=False).squeeze(-1)

# close collector



Data collection interface for robomimic.

Robomimic Data Collector#

class omni.isaac.lab_tasks.utils.data_collector.RobomimicDataCollector[source]#

Bases: object

Data collection interface for robomimic.

This class implements a data collector interface for saving simulation states to disk. The data is stored in HDF5 binary data format. The class is useful for collecting demonstrations. The collected data follows the structure from robomimic.

All datasets in robomimic require the observations and next observations obtained from before and after the environment step. These are stored as a dictionary of observations in the keys “obs” and “next_obs” respectively.

For certain agents in robomimic, the episode data should have the following additional keys: “actions”, “rewards”, “dones”. This behavior can be altered by changing the dataset keys required in the training configuration for the respective learning agent.

For reference on datasets, please check the robomimic documentation.


__init__(env_name, directory_path[, ...])

Initializes the data collection wrapper.


Whether data collection is stopped or not.


Reset the internals of data logger.

add(key, value)

Add a key-value pair to the dataset.


Flush the episode data based on environment indices.


Stop recording and save the file at its current state.



The number of demos collected so far.

__init__(env_name: str, directory_path: str, filename: str = 'test', num_demos: int = 1, flush_freq: int = 1, env_config: dict | None = None)[source]#

Initializes the data collection wrapper.

  • env_name – The name of the environment.

  • directory_path – The path to store collected data.

  • filename – The basename of the saved file. Defaults to “test”.

  • num_demos – Number of demonstrations to record until stopping. Defaults to 1.

  • flush_freq – Frequency to dump data to disk. Defaults to 1.

  • env_config – The configuration for the environment. Defaults to None.

property demo_count: int#

The number of demos collected so far.

is_stopped() bool[source]#

Whether data collection is stopped or not.


True if data collection has stopped.


Reset the internals of data logger.

add(key: str, value: np.ndarray | torch.Tensor)[source]#

Add a key-value pair to the dataset.

The key can be nested by using the “/” character. For example: “obs/joint_pos”. Currently only two-level nesting is supported.

  • key – The key name.

  • value – The corresponding value of shape (N, …), where N is number of environments.


ValueError – When provided key has sub-keys more than 2. Example: “obs/joints/pos”, instead of “obs/joint_pos”.

flush(env_ids: Iterable[int] = (0,))[source]#

Flush the episode data based on environment indices.


env_ids – Environment indices to write data for. Defaults to (0).


Stop recording and save the file at its current state.