Unit 4: Record Your First Dataset — OpenArm Learning Path

Why Data Quality Matters More Than Quantity

A common misconception in robot learning is that more demonstrations automatically produce a better policy. This is false. Neural networks learn to imitate the distribution of behavior in your dataset. If your demonstrations are inconsistent — different paths, different speeds, different object positions — the network learns a blurry average that does not correspond to any actual successful strategy.

50 demonstrations where the arm takes the same clean path, grips the object at the same location, and returns to the same home position will produce a policy that generalizes better than 500 demonstrations with high variance. The first collection session is your chance to establish that consistency. Take your time. Delete and re-record any demo that felt wrong.

For deeper grounding in pipeline design, read the Data Collection Pipeline overview in the Robotics Library.

The LeRobot Dataset Format

Your recordings will be saved in LeRobot format — the standard used by Hugging Face's LeRobot library and compatible with the ACT and Diffusion Policy trainers you will use in Unit 5.

What gets recorded

Joint positions (6 DOF + gripper), joint velocities, end-effector pose, camera frames (RGB + optional depth), timestamps, and task metadata.

File structure

One folder per episode. Each episode contains a data.parquet with state/action arrays and a video/ subfolder with camera streams.

Sampling rate

50Hz by default. Each second of teleoperation produces 50 timesteps. A 10-second pick-and-place produces 500 (state, action) pairs.

Compatibility

LeRobot format loads directly into lerobot.common.datasets. It is also convertible to RLDS for TF-Agents pipelines. Browse reference datasets →

Your Recording Session

Launch the recording script with your task name and target episode count. The script handles episode management, file naming, and quality flagging automatically:

# Activate your environment first
source ~/openarm-env/bin/activate

# Launch recording — saves to ~/openarm-datasets/pick-and-place/
python -m openarm.collect \
  --task pick-and-place \
  --episodes 50 \
  --fps 50 \
  --output ~/openarm-datasets/pick-and-place

# Press ENTER to start a new episode, BACKSPACE to discard the last one
# The script prints a summary after every 10 episodes

Set up your workspace with the same object in the same starting position for every episode. Use tape marks on the table to keep the object position consistent. Your arm should return to the exact same home pose before each episode starts — the recording script enforces this with a homing check.

Practical tip: Aim for episodes between 8–15 seconds each. Too short (under 5s) and the policy has no time to plan a smooth trajectory. Too long (over 20s) and small variations compound. For a standard pick-and-place, 10 seconds per episode is the target.

Data Quality Checklist

Before advancing to Unit 5, verify your dataset passes all five of these checks. Discard and re-record episodes that fail any item.

1 No failed grasps. Every episode ends with the object successfully placed at the target location. Episodes where the arm missed the grasp or dropped the object must be deleted — they teach the model to fail.
2 Consistent start pose. The arm starts from the homed position in every episode. Verify in the dataset viewer that joint angles at timestep 0 are within ±2° of each other across all 50 episodes.
3 No sudden jumps or jerks. Visualize 3–5 random episodes with python -m lerobot.visualize_dataset --dataset ~/openarm-datasets/pick-and-place. Action sequences should be smooth continuous curves, not spikes.
4 Camera frames are clear and in-focus. Check that your workspace camera is not occluded, not blown out by direct light, and that the object is clearly visible throughout each episode. Blurry or dark video degrades policy performance.
5 50 complete episodes saved. Run python -m lerobot.inspect ~/openarm-datasets/pick-and-place and confirm you have exactly 50 episodes with no corrupted files. The script will flag any episodes with missing frames or truncated action sequences.

Browse reference datasets for comparison

The SVRC dataset library contains example pick-and-place recordings you can use as a quality reference. If your data looks significantly different, investigate before training. Open SVRC Datasets →

Unit 4 Complete When...

You have 50 clean episodes saved in LeRobot format at ~/openarm-datasets/pick-and-place. All five quality checklist items pass. Running lerobot.inspect shows 50 episodes, no errors. You are ready to hand this dataset to the training pipeline in Unit 5.