PyCon 2026

From LeRobot to Real Robots

What They Don't Tell You About Making a Robot Arm Grab Things

Stefano Maestri — Software Engineer & Robotics Tinkerer

maeste.it

What to Expect in 30 Minutes

1. The Setup

LeRobot + SO101 hardware, assembly, cameras, recording pipeline

2. The Bug Parade

5 real bugs: CPU policy, action chunking, causal confounding, calibration drift, distribution shift

3. The Fix

Profiling, diagnostics, tools we built, training metrics that matter

4. Lessons & Beyond

What worked, what did not, and the easier path with Cyberwave

5 bugs 5 fixes 1 robot 17.5 degrees

LeRobot + SO101

LeRobot

Open-source PyTorch robotics library by HuggingFace
Datasets, pretrained policies, training & eval tools
CLI: lerobot-record, lerobot-train, lerobot-rollout
Integrates with HF Hub for model/dataset sharing

SO101 Robot Arm

6-DOF leader-follower arm pair
3D-printed structure + STS3215 servos
Budget: ~300 EUR total
Open-source hardware design

The Task

"Grab the red block and put it in the box"

Sounds simple. It was not.

The Setup

The Promise

YouTube: robots folding laundry, cooking eggs, sorting warehouse boxes.
Papers: "straightforward training pipeline."
LeRobot README: "simple, accessible, state-of-the-art."

My plan

Friday evening
unbox & assemble → Saturday
record data & train → Sunday morning
deploy & demo → Sunday lunch
done!

Narrator: it was not done by Sunday lunch.

Hardware Journey

Assembly surprises no tutorial prepares you for

Defective Part

3D-printed shoulder bracket had layer adhesion failure. Caused binding under load. Reprinted with higher infill.

Mismatched Firmware

Some servos shipped on different firmware versions. Motors on mismatched firmware don't work together — every servo had to be flashed to the same version.

Firmware: Windows Only

The FD debug tool for STS3215 firmware updates runs exclusively on Windows. Linux users: find a Windows machine or VM.

Calibration Procedure

Every joint must be manually aligned to its zero position. Millimeter precision required. One wrong offset propagates to the entire kinematic chain.

Lesson: Budget hardware + 3D printing = expect 2-3 weeks of hardware debugging before you write any ML code.

Day 3

It Moves!

The leader arm moves. The follower arm follows.

You move your wrist and a machine six inches away mirrors you in real time. It is, genuinely, magical.

And then...

Movements are jerky — some servos lag behind
USB cameras disconnect randomly after 10 minutes
Recording software crashes on Wayland
Display output drops camera FPS from 30 to 8

Welcome to embedded Linux.

The Software Pipeline

Record

3 cameras, teleop

→

Train

ACT / SmolVLA / Pi0

→

Evaluate

30 Hz control loop

→

Debug

diagnose, repeat

Recording Gotchas

3 USB cameras = bandwidth limit
Wayland breaks camera ordering
MJPEG vs YUYV matters
Leader arm IN camera frame = poison

Training Constraints

GPU memory is the bottleneck
Batch size tuning is critical
2-4 hours per training run
No early stopping by default

Inference Loop

30 Hz = 33ms per step
Camera read + model + actuate
CPU fallback = 8 Hz = failure
Async camera read helps

Timing: 3:30 - 5:30
Walk through the pipeline visually. "This looks linear but you'll loop through it dozens of times." Mention: recording is where most subtle bugs enter. Training is where you wait. Evaluation is where you cry. Debugging is where you live.

Camera Setup & USB Topology

3 Cameras, 1 Task

Camera	Position	Resolution
`front`	Top-down view	640x480 @ 30Hz
`right`	Side angle	640x480 @ 30Hz
`wrist`	Gripper-mounted	640x480 @ 30Hz

Problem: All 3 cameras on USB 2.0 Bus 001 = 480 Mbps shared bandwidth. 3 raw streams need ~830 Mbps.

Fix MJPEG at camera level + spread across USB controllers.

Recording Pipeline

lerobot-record

→

Teleoperate

→

Save Episode

→

Reset Scene

→

Repeat

Problem: Wayland Breaks pynput

Wayland blocks global keyboard event snooping. The pynput.keyboard.Listener silently fails. No arrow-key controls during recording.

Our Fix: stdin + SIGQUIT

stdin prompt: [Y/n/q] between episodes
SIGQUIT (Ctrl+\): end episode early
Zero new threads
Sentinel: --dataset.reset_time_s=-1

# Activate interactive mode
lerobot-record \
  --robot.type=so101_follower \
  --dataset.reset_time_s=-1 \
  --dataset.single_task="Grab the red block" \
  ...

# Between episodes:
[INTERACTIVE RESET] Episode 3 recorded.
Keep scene and record next? [Y/n/q]: y

# During recording:
# Press Ctrl+\ to end episode early
[INTERACTIVE] Episode end requested
via SIGQUIT

Week 1

Recording Episodes

50 attempts to grab a red block. Each recording session:

Check cameras — USB ports shuffle on every reboot. /dev/video0 is now /dev/video4. Why? Nobody knows.
Check FPS — too slow if a display is connected. Run headless.
Check recording — are we capturing both action AND observation? Both cameras? Is the data actually being written to disk?
Perform the grab — 15 seconds of careful teleoperation
Verify the episode — replay the data. Was it clean?

Lesson #1: Data collection IS the job.
Not a step before the job. Not a prerequisite. The job.

The Bugs That Don't Show Up in Tutorials

#1 Silent CPU Fallback

Symptom: GPU utilization at 2%. Training "works" but policy is garbage.

Fix: policy.to("cuda") explicitly after loading. Check next(policy.parameters()).device.

#2 Causal Confounding

Symptom: Policy works perfectly in replay, fails on real robot.

Fix: Remove leader arm from camera frame. It leaks future actions into observations.

#3 Distribution Shift

Symptom: Works at 10am, fails at 3pm. Same setup, same code.

Fix: Record across lighting conditions. Enable color augmentation (but not hue jitter for color tasks).

Common thread: the system never tells you something is wrong.

No errors. No warnings. Just a robot that doesn't work.

Timing: 8:30 - 11:00
Three bugs, one slide. Keep it punchy. For #1: "nvidia-smi showed 2% GPU. The model was on CPU. PyTorch didn't care." For #2: "The policy wasn't learning to grab — it was learning to follow the leader arm. Remove it from the camera frame." For #3: "Sunlight changes everything. Your dataset needs to be robust across conditions." The punchline: "No errors. No warnings. Just failure."

Week 2

"Just Train and Deploy"

Training

ACT policy. 1 hour on GPU.
Loss: 0.08
Looks great.

Deployment

Robot reaches for the block...
and misses.
Every. Single. Time.

Not randomly. Systematically. Always 6-7 cm to the right.

Maybe the model isn't big enough?
Try ACT → misses → Try Pi0.5 → misses → Try SmolVLA → misses

The problem isn't the model.

Policy Landscape

Choosing the right model for a budget setup

ACT

chunk_size=100, n_action_steps=100

SmolVLA

Language prompt + vision encoder

Pi0

chunk_size=50, pretrained backbone

BUG #2 Action Chunking Confusion

When "GPU idle 90% of the time" is actually correct

ACT Configuration

chunk_size = 100, n_action_steps = 100

1 forward pass = 100 actions = 3.3s of motion at 30 Hz

The False Alarm

Sporadic "running slower than requested fps" warnings = timing jitter, not a bug.

Cost us 2 days debugging a non-problem.

Tools We Built for Diagnosis

compare_leader_follower

Reads every episode in a dataset. Computes per-joint drift statistics between leader commands and follower positions.

Found the 17.5° wrist_flex drift

# Usage
python compare_leader_follower.py \
  --dataset ./data/pick_block

evaluate_dataset_quality

Flags outlier episodes by trajectory smoothness, gripper timing, and completion metrics.

Identified 12% corrupted episodes

# Usage
python evaluate_dataset_quality.py \
  --dataset ./data/pick_block \
  --threshold 2.0

read_*_pos

Static calibration check. Reads current joint positions in real-time. Compare leader vs follower live.

Verify calibration before recording

# Usage
python read_leader_pos.py
python read_follower_pos.py

~600 lines of Python

that changed everything

Timing: 14:00 - 15:30
"These aren't sophisticated ML tools. They're data analysis scripts. But they saved us weeks." Quick walkthrough of each. Emphasize: these are small scripts anyone can write. The hard part isn't the code — it's knowing to look.

BUG #3 Causal Confounding

The policy learned to cheat

What Happened

Leader arm visible in camera during teleoperation recording. Policy learns a shortcut: track the leader arm, not the block.

At Inference

Leader arm is absent. Policy sees an unfamiliar scene. Output: random, erratic movements.

Fix

Reposition cameras so the leader arm is never in frame. Re-record entire dataset.

BUG #4 Calibration Drift AHA MOMENT

The root cause of everything

17.5°

wrist_flex offset (before)

~6-7 cm

systematic gripper error

0.85°

wrist_flex offset (after)

# compare_leader_follower.py output

Joint          Mean |diff|  Max |diff|
───────────────────────────────────
shoulder_pan     1.2°        3.1°
shoulder_lift    0.9°        2.4°
elbow_flex       1.1°        2.8°
wrist_flex      17.5°       22.3°
wrist_roll       0.7°        1.9°
gripper          2.3°        4.1°

WARNING: wrist_flex exceeds 5° threshold!
Recalibrate this joint.

Impact of Recalibration

Metric	Before	After
wrist_flex offset	17.5°	0.85°
Gripper accuracy	±6-7 cm	±0.3 cm
Grasp success	0%	80%

Lesson: No amount of training, bigger models, or more epochs can fix a calibration error. Always verify hardware first.

The Answer

Wrist motor offset between leader and follower

17.5°

Every single episode I recorded taught the robot
a physically incorrect mapping of the world.

A 40-line Python script found in 3 seconds what I couldn't find in 2 weeks.

0%

Before recalibration

→

80%

After recalibration

Same policy. Same code. Same hyperparameters.

BUG #5 Distribution Shift

Morning light training, afternoon deployment

Same Robot. Same Block. Different Light.

Policy trained with morning lighting fails in afternoon conditions. Shadows change, color temperature shifts, white balance differs.

Fix: Image Augmentation

# Enable built-in transforms
lerobot-train \
  --training.image_transforms.enable=true \
  ...

Default augmentations:

Brightness: (0.8, 1.2)
Contrast: (0.8, 1.2)
Saturation: (0.5, 1.5)
Hue: (-0.05, 0.05)
Sharpness jitter

Caution: If your task relies on color (e.g., "red block"), disable hue jitter to prevent the model from becoming color-invariant.

Reading Training Metrics

What l1_loss actually means for your robot

0.068

l1_loss (initial training)

~12°

per-joint error

~6-7 cm

gripper positional error

Dataset Visualization

Your most powerful debugging tool: lerobot-dataset-viz

LIVE DEMO

Rerun visualization: 3 camera streams + joint positions + episode timeline

What to look for

Leader arm visibility in camera frames
Joint position discontinuities
Timing synchronization across cameras
Gripper open/close at correct moments

Command

lerobot-dataset-viz \
  --repo-id smaestri/so101_pick \
  --episode-index 0

Requires pip install 'lerobot[viz]'

Robot in Action

Success and failure — because both matter

Success

Horizontal block, centered, good light

Failure

Rotated block, 45° angle, before hard-example fix

Lessons Learned

What Worked

Profile first, optimize second

The 30Hz loop breakdown revealed that compute was never the bottleneck.

Dataset as ground truth

Every bug was found by interrogating the recorded data, not the model.

Image augmentation

One flag (image_transforms.enable=true) fixed lighting sensitivity.

Build diagnostic tools

600 lines of Python saved weeks of guesswork.

What Didn't Work

Bigger models

Switched to SmolVLA when the problem was a 17.5° calibration offset. Model size is irrelevant if the data is wrong.

More epochs

Trained for 200k steps instead of 100k. Loss plateaued at 0.065. The problem was data distribution, not underfitting.

Removing wrist camera

Hypothesized fewer inputs = easier learning. Wrong: wrist perspective is critical for fine grasping.

Ignoring hardware

Spent 3 weeks on software when the answer was a recalibration that took 10 minutes.

Lessons Learned

"Investing in diagnostics beats investing in bigger models."

The Pattern

Every visible symptom had its root cause 2-3 layers below the surface.

Policy fails → data is wrong → calibration is off
Inconsistent results → lighting variance → no augmentation
Skill plateau → dataset imbalance → missing hard examples

Concrete Takeaways

1. Verify calibration before every recording session

2. Build dataset inspection tools before training tools

3. Start with ACT. Add complexity only when needed

4. Record across conditions (lighting, position, angle)

5. Check your GPU utilization. Always

Timing: 19:00 - 20:30
Transition slide. Summarize the local journey. "If you take one thing from this talk: build diagnostic tools before you build training pipelines." The pattern observation is key — symptoms mislead you. Root causes hide. Now transition: "But what if there was a way to skip most of these pitfalls?"

What If There Was an Easier Way?

After fighting every layer of the stack, I found a platform that automates most of it.

Cyberwave

One API for any robot. Write Python once — it runs in simulation AND on real hardware.

Pre-configured digital twins for common robots
Cloud training with managed GPU infrastructure
Automated deployment and fleet management
Built-in observability and diagnostics

Timing: 20:30 - 22:00
Transition carefully. "I'm not saying don't learn the hard way — I just showed you why the hard way is valuable. But if your goal is to ship a robotics application, there's now a platform that handles the infrastructure." Keep it genuine: "I found Cyberwave after fighting all these issues. Here's what it does."

Cyberwave: The Easier Way

What if the hard parts were automated?

Platform Overview

Digital Twins — pre-configured simulation environments
Cloud Training — managed GPU pipeline, no OOM debugging
Automated Deployment — sim-to-real transfer handled
SO101 Support — voice-controlled pick-and-place tutorial

Links

cyberwave.com

docs.cyberwave.com/tutorials/so101-voice-pick-and-place

Challenge	Us (weeks)	Cyberwave
Calibration	3 weeks	Auto-detected
Camera setup	2 days	Pre-configured
Training infra	1 day	Cloud GPU
Data augmentation	1 day	Built-in
Deployment	2 days	One-click
Diagnostics	1 week	Dashboard

The learning is valuable. But when you need production results, platforms like Cyberwave let you skip the pain and focus on the task.

When to Go Local vs Cloud

Go Local When...

You're learning how robot learning works

Doing custom research on novel policies

Working within a tight budget

Need full control over every parameter

Your robot isn't supported yet

Go Cloud When...

Building for production deployment

Working in a team (shared datasets, models)

Need to scale beyond one robot

You're time-constrained

Want built-in observability

"Start local to understand. Go cloud to ship."

Timing: 25:00 - 26:30
Balanced perspective. "Neither approach is wrong. They serve different goals." Emphasize: the local journey gives you intuition that's invaluable even if you later use a platform. The cloud approach is for when your goal shifts from learning to shipping. End with the tagline.

PyCon 2026

Thank You

Questions?

Stefano Maestri

github.com/huggingface/lerobot

cyberwave.com

maeste.it

"Built with love, frustration, and a 17.5° calibration drift."