1 / 22
Speaker Notes (N to toggle)
PyCon 2026

From LeRobot to Real Robots

What They Don't Tell You About Making a Robot Arm Grab Things

Stefano Maestri — Software Engineer & Robotics Tinkerer

QR code to https://maeste.it

maeste.it

What to Expect in 30 Minutes

1. The Setup

LeRobot + SO101 hardware, assembly, cameras, recording pipeline

2. The Bug Parade

5 real bugs: CPU policy, action chunking, causal confounding, calibration drift, distribution shift

3. The Fix

Profiling, diagnostics, tools we built, training metrics that matter

4. Lessons & Beyond

What worked, what did not, and the easier path with Cyberwave

5 bugs 5 fixes 1 robot 17.5 degrees

LeRobot + SO101

LeRobot

  • Open-source PyTorch robotics library by HuggingFace
  • Datasets, pretrained policies, training & eval tools
  • CLI: lerobot-record, lerobot-train, lerobot-rollout
  • Integrates with HF Hub for model/dataset sharing

SO101 Robot Arm

  • 6-DOF leader-follower arm pair
  • 3D-printed structure + STS3215 servos
  • Budget: ~300 EUR total
  • Open-source hardware design

The Task

"Grab the red block and put it in the box"

Sounds simple. It was not.

B BOX pick & place
The Setup

The Promise

YouTube: robots folding laundry, cooking eggs, sorting warehouse boxes.
Papers: "straightforward training pipeline."
LeRobot README: "simple, accessible, state-of-the-art."

My plan

Friday evening
unbox & assemble
Saturday
record data & train
Sunday morning
deploy & demo
Sunday lunch
done!

Narrator: it was not done by Sunday lunch.

Hardware Journey

Assembly surprises no tutorial prepares you for

Defective Part

3D-printed shoulder bracket had layer adhesion failure. Caused binding under load. Reprinted with higher infill.

Mismatched Firmware

Some servos shipped on different firmware versions. Motors on mismatched firmware don't work together — every servo had to be flashed to the same version.

Firmware: Windows Only

The FD debug tool for STS3215 firmware updates runs exclusively on Windows. Linux users: find a Windows machine or VM.

Calibration Procedure

Every joint must be manually aligned to its zero position. Millimeter precision required. One wrong offset propagates to the entire kinematic chain.

Lesson: Budget hardware + 3D printing = expect 2-3 weeks of hardware debugging before you write any ML code.
Day 3

It Moves!

The leader arm moves. The follower arm follows.

You move your wrist and a machine six inches away mirrors you in real time. It is, genuinely, magical.

And then...

  • Movements are jerky — some servos lag behind
  • USB cameras disconnect randomly after 10 minutes
  • Recording software crashes on Wayland
  • Display output drops camera FPS from 30 to 8

Welcome to embedded Linux.

The Software Pipeline

Record
3 cameras, teleop
Train
ACT / SmolVLA / Pi0
Evaluate
30 Hz control loop
Debug
diagnose, repeat

Recording Gotchas

  • 3 USB cameras = bandwidth limit
  • Wayland breaks camera ordering
  • MJPEG vs YUYV matters
  • Leader arm IN camera frame = poison

Training Constraints

  • GPU memory is the bottleneck
  • Batch size tuning is critical
  • 2-4 hours per training run
  • No early stopping by default

Inference Loop

  • 30 Hz = 33ms per step
  • Camera read + model + actuate
  • CPU fallback = 8 Hz = failure
  • Async camera read helps
Timing: 3:30 - 5:30
Walk through the pipeline visually. "This looks linear but you'll loop through it dozens of times." Mention: recording is where most subtle bugs enter. Training is where you wait. Evaluation is where you cry. Debugging is where you live.

Camera Setup & USB Topology

3 Cameras, 1 Task

CameraPositionResolution
frontTop-down view640x480 @ 30Hz
rightSide angle640x480 @ 30Hz
wristGripper-mounted640x480 @ 30Hz
Problem: All 3 cameras on USB 2.0 Bus 001 = 480 Mbps shared bandwidth. 3 raw streams need ~830 Mbps.

Fix MJPEG at camera level + spread across USB controllers.

USB Host (Bus 001) USB 2.0 Hub front cam right cam wrist cam SO101 Follower SO101 Leader Shared 480 Mbps bandwidth

Recording Pipeline

lerobot-record
Teleoperate
Save Episode
Reset Scene
Repeat

Problem: Wayland Breaks pynput

Wayland blocks global keyboard event snooping. The pynput.keyboard.Listener silently fails. No arrow-key controls during recording.

Our Fix: stdin + SIGQUIT

  • stdin prompt: [Y/n/q] between episodes
  • SIGQUIT (Ctrl+\): end episode early
  • Zero new threads
  • Sentinel: --dataset.reset_time_s=-1
# Activate interactive mode
lerobot-record \
  --robot.type=so101_follower \
  --dataset.reset_time_s=-1 \
  --dataset.single_task="Grab the red block" \
  ...

# Between episodes:
[INTERACTIVE RESET] Episode 3 recorded.
Keep scene and record next? [Y/n/q]: y

# During recording:
# Press Ctrl+\ to end episode early
[INTERACTIVE] Episode end requested
via SIGQUIT
Week 1

Recording Episodes

50 attempts to grab a red block. Each recording session:

  1. Check cameras — USB ports shuffle on every reboot. /dev/video0 is now /dev/video4. Why? Nobody knows.
  2. Check FPS — too slow if a display is connected. Run headless.
  3. Check recording — are we capturing both action AND observation? Both cameras? Is the data actually being written to disk?
  4. Perform the grab — 15 seconds of careful teleoperation
  5. Verify the episode — replay the data. Was it clean?

Lesson #1: Data collection IS the job.
Not a step before the job. Not a prerequisite. The job.

The Bugs That Don't Show Up in Tutorials

#1 Silent CPU Fallback

Symptom: GPU utilization at 2%. Training "works" but policy is garbage.

Fix: policy.to("cuda") explicitly after loading. Check next(policy.parameters()).device.

#2 Causal Confounding

Symptom: Policy works perfectly in replay, fails on real robot.

Fix: Remove leader arm from camera frame. It leaks future actions into observations.

#3 Distribution Shift

Symptom: Works at 10am, fails at 3pm. Same setup, same code.

Fix: Record across lighting conditions. Enable color augmentation (but not hue jitter for color tasks).

Common thread: the system never tells you something is wrong.

No errors. No warnings. Just a robot that doesn't work.

Timing: 8:30 - 11:00
Three bugs, one slide. Keep it punchy. For #1: "nvidia-smi showed 2% GPU. The model was on CPU. PyTorch didn't care." For #2: "The policy wasn't learning to grab — it was learning to follow the leader arm. Remove it from the camera frame." For #3: "Sunlight changes everything. Your dataset needs to be robust across conditions." The punchline: "No errors. No warnings. Just failure."
Week 2

"Just Train and Deploy"

Training

ACT policy. 1 hour on GPU.
Loss: 0.08
Looks great.

Deployment

Robot reaches for the block...
and misses.
Every. Single. Time.

Not randomly. Systematically. Always 6-7 cm to the right.

Maybe the model isn't big enough?
Try ACT → misses → Try Pi0.5 → misses → Try SmolVLA → misses

The problem isn't the model.

Policy Landscape

Choosing the right model for a budget setup

Parameters (log scale) Generalization ACT 80M params SmolVLA 500M params language-conditioned Pi0 / Pi0-FAST 3B params foundation model OUR PICK Specialist Generalist 16 GB VRAM limit
ACT

chunk_size=100, n_action_steps=100

SmolVLA

Language prompt + vision encoder

Pi0

chunk_size=50, pretrained backbone

BUG #2 Action Chunking Confusion

When "GPU idle 90% of the time" is actually correct

FWD 99 cached actions replayed (3.3s) FWD 99 cached actions replayed (3.3s) FWD GPU: Time: 0s 3.3s 6.6s

ACT Configuration

chunk_size = 100, n_action_steps = 100

1 forward pass = 100 actions = 3.3s of motion at 30 Hz

The False Alarm

Sporadic "running slower than requested fps" warnings = timing jitter, not a bug.

Cost us 2 days debugging a non-problem.

Tools We Built for Diagnosis

compare_leader_follower

Reads every episode in a dataset. Computes per-joint drift statistics between leader commands and follower positions.

Found the 17.5° wrist_flex drift

# Usage
python compare_leader_follower.py \
  --dataset ./data/pick_block

evaluate_dataset_quality

Flags outlier episodes by trajectory smoothness, gripper timing, and completion metrics.

Identified 12% corrupted episodes

# Usage
python evaluate_dataset_quality.py \
  --dataset ./data/pick_block \
  --threshold 2.0

read_*_pos

Static calibration check. Reads current joint positions in real-time. Compare leader vs follower live.

Verify calibration before recording

# Usage
python read_leader_pos.py
python read_follower_pos.py

~600 lines of Python

that changed everything

Timing: 14:00 - 15:30
"These aren't sophisticated ML tools. They're data analysis scripts. But they saved us weeks." Quick walkthrough of each. Emphasize: these are small scripts anyone can write. The hard part isn't the code — it's knowing to look.

BUG #3 Causal Confounding

The policy learned to cheat

What Happened

Leader arm visible in camera during teleoperation recording. Policy learns a shortcut: track the leader arm, not the block.

At Inference

Leader arm is absent. Policy sees an unfamiliar scene. Output: random, erratic movements.

Fix

Reposition cameras so the leader arm is never in frame. Re-record entire dataset.

Training (recording) Camera view B LEADER ARM attention Inference (deployment) Camera view B GONE (no leader) ???

BUG #4 Calibration Drift AHA MOMENT

The root cause of everything

17.5°
wrist_flex offset (before)
~6-7 cm
systematic gripper error
0.85°
wrist_flex offset (after)
# compare_leader_follower.py output

Joint          Mean |diff|  Max |diff|
───────────────────────────────────
shoulder_pan     1.2°        3.1°
shoulder_lift    0.9°        2.4°
elbow_flex       1.1°        2.8°
wrist_flex      17.5°       22.3°
wrist_roll       0.7°        1.9°
gripper          2.3°        4.1°

WARNING: wrist_flex exceeds 5° threshold!
Recalibrate this joint.

Impact of Recalibration

MetricBeforeAfter
wrist_flex offset17.5°0.85°
Gripper accuracy±6-7 cm±0.3 cm
Grasp success0%80%
Lesson: No amount of training, bigger models, or more epochs can fix a calibration error. Always verify hardware first.
The Answer

Wrist motor offset between leader and follower

17.5°

Every single episode I recorded taught the robot
a physically incorrect mapping of the world.

A 40-line Python script found in 3 seconds what I couldn't find in 2 weeks.

0%
Before recalibration
80%
After recalibration

Same policy. Same code. Same hyperparameters.

BUG #5 Distribution Shift

Morning light training, afternoon deployment

Same Robot. Same Block. Different Light.

Policy trained with morning lighting fails in afternoon conditions. Shadows change, color temperature shifts, white balance differs.

Fix: Image Augmentation

# Enable built-in transforms
lerobot-train \
  --training.image_transforms.enable=true \
  ...

Default augmentations:

  • Brightness: (0.8, 1.2)
  • Contrast: (0.8, 1.2)
  • Saturation: (0.5, 1.5)
  • Hue: (-0.05, 0.05)
  • Sharpness jitter
TRAINING AM Works DEPLOYMENT PM Fails
Caution: If your task relies on color (e.g., "red block"), disable hue jitter to prevent the model from becoming color-invariant.

Reading Training Metrics

What l1_loss actually means for your robot

0.068
l1_loss (initial training)
~12°
per-joint error
~6-7 cm
gripper positional error
l1_loss scale and what it means 0.03 reliable grasp 0.068 our initial result 0.10+ random motion 0.00 0.15 l1_loss x servo_range (180°) = per-joint error → kinematic chain → gripper error

Dataset Visualization

Your most powerful debugging tool: lerobot-dataset-viz

LIVE DEMO

Rerun visualization: 3 camera streams + joint positions + episode timeline

What to look for

  • Leader arm visibility in camera frames
  • Joint position discontinuities
  • Timing synchronization across cameras
  • Gripper open/close at correct moments

Command

lerobot-dataset-viz \
  --repo-id smaestri/so101_pick \
  --episode-index 0

Requires pip install 'lerobot[viz]'

Robot in Action

Success and failure — because both matter

Success
Horizontal block, centered, good light
Failure
Rotated block, 45° angle, before hard-example fix

Lessons Learned

What Worked

Profile first, optimize second

The 30Hz loop breakdown revealed that compute was never the bottleneck.

Dataset as ground truth

Every bug was found by interrogating the recorded data, not the model.

Image augmentation

One flag (image_transforms.enable=true) fixed lighting sensitivity.

Build diagnostic tools

600 lines of Python saved weeks of guesswork.

What Didn't Work

Bigger models

Switched to SmolVLA when the problem was a 17.5° calibration offset. Model size is irrelevant if the data is wrong.

More epochs

Trained for 200k steps instead of 100k. Loss plateaued at 0.065. The problem was data distribution, not underfitting.

Removing wrist camera

Hypothesized fewer inputs = easier learning. Wrong: wrist perspective is critical for fine grasping.

Ignoring hardware

Spent 3 weeks on software when the answer was a recalibration that took 10 minutes.

Lessons Learned

"Investing in diagnostics beats investing in bigger models."

The Pattern

Every visible symptom had its root cause 2-3 layers below the surface.

  • Policy fails → data is wrong → calibration is off
  • Inconsistent results → lighting variance → no augmentation
  • Skill plateau → dataset imbalance → missing hard examples

Concrete Takeaways

1. Verify calibration before every recording session

2. Build dataset inspection tools before training tools

3. Start with ACT. Add complexity only when needed

4. Record across conditions (lighting, position, angle)

5. Check your GPU utilization. Always

Timing: 19:00 - 20:30
Transition slide. Summarize the local journey. "If you take one thing from this talk: build diagnostic tools before you build training pipelines." The pattern observation is key — symptoms mislead you. Root causes hide. Now transition: "But what if there was a way to skip most of these pitfalls?"

What If There Was an Easier Way?

After fighting every layer of the stack, I found a platform that automates most of it.

Cyberwave

One API for any robot. Write Python once — it runs in simulation AND on real hardware.

  • Pre-configured digital twins for common robots
  • Cloud training with managed GPU infrastructure
  • Automated deployment and fleet management
  • Built-in observability and diagnostics
Your Python Code CYBERWAVE PLATFORM Digital Twin Cloud Training Observability Deployment Real Robot
Timing: 20:30 - 22:00
Transition carefully. "I'm not saying don't learn the hard way — I just showed you why the hard way is valuable. But if your goal is to ship a robotics application, there's now a platform that handles the infrastructure." Keep it genuine: "I found Cyberwave after fighting all these issues. Here's what it does."

Cyberwave: The Easier Way

What if the hard parts were automated?

Platform Overview

  • Digital Twins — pre-configured simulation environments
  • Cloud Training — managed GPU pipeline, no OOM debugging
  • Automated Deployment — sim-to-real transfer handled
  • SO101 Support — voice-controlled pick-and-place tutorial

Links

cyberwave.com

docs.cyberwave.com/tutorials/so101-voice-pick-and-place

ChallengeUs (weeks)Cyberwave
Calibration3 weeksAuto-detected
Camera setup2 daysPre-configured
Training infra1 dayCloud GPU
Data augmentation1 dayBuilt-in
Deployment2 daysOne-click
Diagnostics1 weekDashboard
The learning is valuable. But when you need production results, platforms like Cyberwave let you skip the pain and focus on the task.

When to Go Local vs Cloud

Go Local When...
You're learning how robot learning works
Doing custom research on novel policies
Working within a tight budget
Need full control over every parameter
Your robot isn't supported yet
Go Cloud When...
Building for production deployment
Working in a team (shared datasets, models)
Need to scale beyond one robot
You're time-constrained
Want built-in observability

"Start local to understand. Go cloud to ship."

Timing: 25:00 - 26:30
Balanced perspective. "Neither approach is wrong. They serve different goals." Emphasize: the local journey gives you intuition that's invaluable even if you later use a platform. The cloud approach is for when your goal shifts from learning to shipping. End with the tagline.
PyCon 2026

Thank You

Questions?

Stefano Maestri

github.com/huggingface/lerobot

cyberwave.com

QR code to https://maeste.it

maeste.it

"Built with love, frustration, and a 17.5° calibration drift."