Hands-on robot learning

From LeRobot
to Real Robots

What they don't tell you about making a robot arm grab things

Stefano Maestri

Software Engineer & Robotics Tinkerer

QR to maeste.it
maeste.it

What to Expect in 30 Minutes

Intro video — youtu.be/I44_zbEwz_w youtu.be/I44_zbEwz_w

What to Expect in 30 Minutes

Four acts, one budget robot arm, and a lot of debugging

1
The Setup

LeRobot + SO101 hardware, assembly, cameras, and the recording pipeline.

2
The Bug Parade

Five real bugs: CPU policy, action chunking, causal confounding, calibration drift, distribution shift.

3
The Fix

Profiling, diagnostics, the tools we built, and the training metrics that actually matter.

4
Lessons & Beyond

What worked, what didn't, and the easier path with Cyberwave.

5real bugs
5hard-won fixes
1robot arm
17.5°of calibration drift
01

The Setup

LeRobot, an SO101 arm, and a deceptively simple task

LeRobot + SO101

The stack: an open-source library and a €300 arm

LeRobot
  • Open-source PyTorch robotics library by HuggingFace
  • Datasets, pretrained policies, training & eval tools
  • CLI: lerobot-record, -train, -rollout
  • Integrates with the HF Hub for sharing
SO101 Arm
  • 6-DOF leader–follower arm pair
  • 3D-printed structure + STS3215 servos
  • Budget: ~€300 total
  • Open-source hardware design
The Task

"Grab the red block and put it in the box."

Sounds simple. It was not.

The Promise

What the internet said robot learning would be

The pitch

YouTube: robots folding laundry, cooking eggs, sorting warehouse boxes.

Papers: "a straightforward training pipeline."

LeRobot README: "simple, accessible, state-of-the-art."

My weekend plan
  • Friday eve — unbox & assemble
  • Saturday — record data & train
  • Sunday morning — deploy & demo
  • Sunday lunch — done!
Narrator It was not done by Sunday lunch.

Hardware Journey

Assembly surprises no tutorial prepares you for

Defective Part

A 3D-printed shoulder bracket had layer-adhesion failure, binding under load. Reprinted with higher infill.

Mismatched Firmware

Servos shipped on different firmware versions. Every motor had to be flashed to the same one to work together.

Windows-Only Tool

The FD debug tool for STS3215 firmware runs only on Windows. Linux users: find a VM.

Calibration

Every joint manually aligned to zero, to the millimetre. One bad offset propagates down the whole chain.

Lesson Budget hardware + 3D printing = expect 2–3 weeks of hardware debugging before you write a line of ML code.

It Moves!

The leader arm moves. The follower follows.

You move your wrist and a machine six inches away mirrors you in real time. It is, genuinely, magical.

And then…
  • Movements are jerky — some servos lag behind
  • USB cameras disconnect randomly after 10 minutes
  • Recording software crashes on Wayland
  • Display output drops camera FPS from 30 to 8

Welcome to embedded Linux.

The Software Pipeline

It looks linear. You'll loop through it dozens of times.

Record3 cameras, teleop
TrainACT / SmolVLA / Pi0
Evaluate30 Hz control loop
Debugdiagnose, repeat
Recording gotchas
  • 3 USB cameras = bandwidth limit
  • Wayland breaks camera ordering
  • MJPEG vs YUYV matters
  • Leader arm in frame = poison
Training constraints
  • GPU memory is the bottleneck
  • Batch-size tuning is critical
  • 2–4 hours per training run
  • No early stopping by default
Inference loop
  • 30 Hz = 33 ms per step
  • Camera read + model + actuate
  • CPU fallback = 8 Hz = failure
  • Async camera read helps

Camera Setup & USB Topology

Three cameras, one shared bus, not enough bandwidth

CameraPositionResolution
frontTop-down view640×480 @ 30 Hz
rightSide angle640×480 @ 30 Hz
wristGripper-mounted640×480 @ 30 Hz
The math doesn't close

All 3 cameras on USB 2.0 Bus 001 share 480 Mbps. Three raw streams need ~830 Mbps.

Fix MJPEG at the camera level + spread cameras across separate USB controllers.

Recording Pipeline

When Wayland silently breaks your keyboard controls

The problem

Wayland blocks global keyboard snooping. pynput's listener silently fails — no arrow-key controls during recording, and no error.

Our fix — stdin + SIGQUIT
  • stdin prompt [Y/n/q] between episodes
  • SIGQUIT (Ctrl+\) ends an episode early
  • Zero new threads
# activate interactive mode
lerobot-record \
  --robot.type=so101_follower \
  --dataset.reset_time_s=-1 \
  --dataset.single_task="Grab the red block"

# between episodes
[INTERACTIVE RESET] Episode 3 recorded.
Keep scene and record next? [Y/n/q]: y

# during recording
# press Ctrl+\ to end episode early
[INTERACTIVE] Episode end requested
via SIGQUIT

Recording Episodes

50 attempts to grab one red block

  • Check cameras — USB ports shuffle on every reboot. /dev/video0 is now video4. Why? Nobody knows.
  • Check FPS — too slow with a display attached. Run headless.
  • Check recording — capturing action AND observation? Both cameras? Actually written to disk?
  • Perform the grab — 15 seconds of careful teleoperation.
  • Verify the episode — replay the data. Was it clean?

…then do it 49 more times.

Lesson #1 Data collection IS the job. Not a step before the job. Not a prerequisite. The job.
02

The Bug Parade

Five bugs that never printed a single error message

The Bugs That Don't Show Up in Tutorials

No errors. No warnings. Just a robot that doesn't work.

#1 Silent CPU Fallback

Symptom: GPU at 2%. Training "works," policy is garbage.

Fix: policy.to("cuda") explicitly; check the param device.

#2 Causal Confounding

Symptom: Perfect in replay, fails on the real robot.

Fix: Remove the leader arm from the camera frame.

#3 Distribution Shift

Symptom: Works at 10am, fails at 3pm. Same code.

Fix: Record across lighting; enable colour augmentation.

Common thread The system never tells you something is wrong.

"Just Train and Deploy"

Loss 0.08. Looks great. Misses every single time.

Training

ACT policy. 1 hour on GPU. Loss 0.08. Looks great.

Deployment

The robot reaches for the block… and misses. Not randomly — systematically, always 6–7 cm to the right.

Maybe the model isn't big enough? Try ACT → misses. Try Pi0.5 → misses. Try SmolVLA → misses.

The catch The problem isn't the model.

Policy Landscape

Choosing the right model for a 16 GB budget setup

Our pick
ACT

80M params · specialist

chunk_size=100, n_action_steps=100

SmolVLA

500M params · language-conditioned

Language prompt + vision encoder.

Pi0 / Pi0-FAST

3B params · foundation model

chunk_size=50, pretrained backbone.

Reality Bigger and more general isn't free — every parameter fights your 16 GB VRAM limit.

Bug #2

Action Chunking Confusion

When "GPU idle 90% of the time" is actually correct

ACT configuration

chunk_size = 100, n_action_steps = 100

1 forward pass = 100 actions = 3.3 s of motion at 30 Hz. The GPU runs once, then replays cached actions.

The false alarm

Sporadic "running slower than requested fps" warnings are timing jitter — not a bug.

The cost Two days spent debugging a non-problem.

Tools We Built for Diagnosis

~600 lines of Python that changed everything

compare_leader_follower

Per-joint drift statistics between leader commands and follower positions, across every episode.

Found the 17.5° wrist_flex drift.

evaluate_dataset_quality

Flags outlier episodes by trajectory smoothness, gripper timing and completion metrics.

Identified 12% corrupted episodes.

read_*_pos

Static calibration check — reads live joint positions, leader vs follower, in real time.

Verify calibration before recording.

The point These aren't sophisticated ML tools — they're small scripts. The hard part is knowing to look.

Bug #3

Causal Confounding

The policy learned to cheat

What happened

The leader arm was visible in-frame during recording. The policy learned a shortcut: track the leader arm, not the block.

At inference

The leader arm is gone. The policy sees an unfamiliar scene and outputs random, erratic movements.

Fix

Reposition cameras so the leader arm is never in frame. Then re-record the entire dataset.

Insight The model wasn't learning to grasp — it was learning to watch the puppeteer.
03

The Answer

A 40-line script found in 3 seconds what I couldn't find in 2 weeks

Bug #4 · Aha moment

Calibration Drift

The root cause of everything, hiding in one joint

# compare_leader_follower.py
Joint          Mean|diff|  Max|diff|
────────────────────────────────
shoulder_pan      1.2°       3.1°
shoulder_lift     0.9°       2.4°
elbow_flex        1.1°       2.8°
wrist_flex       17.5°      22.3°
wrist_roll        0.7°       1.9°
gripper           2.3°       4.1°

WARNING: wrist_flex exceeds 5° threshold!
Impact of recalibration
MetricBeforeAfter
wrist_flex offset17.5°0.85°
Gripper accuracy±6–7 cm±0.3 cm
Grasp success0%80%
Lesson No model size or epoch count can fix a calibration error. Verify hardware first.

One Wrong Offset, Two Weeks Lost

Every episode taught the robot a physically incorrect world

0%
grasp success before
80%
grasp success after
17.5°
wrist_flex offset

Same policy. Same code. Same hyperparameters.

Bug #5

Distribution Shift

Morning-light training, afternoon deployment

Same robot, different light

A policy trained in morning light fails in the afternoon. Shadows move, colour temperature shifts, white balance drifts.

Caution For colour-dependent tasks ("red block"), disable hue jitter — or the model goes colour-blind.
# enable built-in transforms
lerobot-train \
  --training.image_transforms.enable=true

# default augmentations
brightness : (0.8, 1.2)
contrast   : (0.8, 1.2)
saturation : (0.5, 1.5)
hue        : (-0.05, 0.05)  # off for colour tasks
sharpness  : jitter

Reading Training Metrics

What l1_loss actually means for your robot

0.068l1_loss (initial)
~12°per-joint error
~6–7 cmgripper positional error
The l1_loss scale
  • 0.03 — reliable grasp
  • 0.068 — our initial result
  • 0.10+ — essentially random motion
Why it matters

l1_loss × servo_range (180°) = per-joint error → propagates down the kinematic chain → gripper error.

Takeaway A "small" loss can still be centimetres off at the gripper.

Robot in Action

Success and failure — because both matter

Robot grasp demo — youtu.be/hs3DzuAmxSQ youtu.be/hs3DzuAmxSQ
Success

Horizontal block, centred, good light.

Failure

Rotated block, 45° angle, before the hard-example fix.

04

Lessons & Beyond

What worked, what didn't, and the easier path

Lessons Learned

Every symptom hid its root cause two layers below

What worked
  • Profile first, optimise second — compute was never the bottleneck
  • Dataset as ground truth — every bug was found in the data
  • Image augmentation — one flag fixed lighting sensitivity
  • Build diagnostic tools — 600 lines saved weeks
What didn't
  • Bigger models — irrelevant when the data is wrong
  • More epochs — loss plateaued; it wasn't underfitting
  • Removing the wrist camera — that view is critical
  • Ignoring hardware — the answer was a 10-minute recalibration
In one line Investing in diagnostics beats investing in bigger models.

The easier way

What If the Hard Parts Were Automated?

Cyberwave — write Python once, run it in sim and on real hardware

Platform
  • Digital twins — pre-configured simulation environments
  • Cloud training — managed GPU, no OOM debugging
  • Automated deployment — sim-to-real handled
  • SO101 support — voice-controlled pick-and-place tutorial
Us vs the platform
ChallengeUsCyberwave
Calibration3 weeksauto-detected
Camera setup2 dayspre-configured
Training infra1 daycloud GPU
Diagnostics1 weekdashboard

When to Go Local vs Cloud

Neither is wrong — they serve different goals

Go local when…
  • You're learning how robot learning works
  • Doing custom research on novel policies
  • Working within a tight budget
  • You need full control of every parameter
Go cloud when…
  • Building for production deployment
  • Working in a team — shared datasets & models
  • You need to scale beyond one robot
  • You want built-in observability
Rule of thumb Start local to understand. Go cloud to ship.

Thank You

Questions? Applausi liberi.

"Built with love, frustration, and a 17.5° calibration drift."

QR to maeste.it
maeste.it