01 Carmelo

Carmelo wants to keep
making his own toast.

Independent living isn’t about removing him from his kitchen.
It’s about handling the steps that have become hard — and asking for help on the rest.

02

The Problem

Robots fail. The question is how.

What generic VLA policies do

Mid-skill failure = silent stop
Hot toaster contact = no awareness
Hand in workspace = no detection
Partial press of lever = retry forever

What Carmelo’s Cucina does

Mid-skill failure = verify, retry once, then ask
Hot toaster contact = preflight refuses dispatch
Hand in workspace = halts, narrates pause
Partial press = “Could you finish that part?”

03

Why this matters here

Closing the gap MakerMods explicitly leaves open.

“AI automations are for convenience only,
not safety or security-critical use cases.”

— makermods.ai/modblocks

Carmelo’s Cucina is the safety layer that turns the MakerMods convenience stack
into something deployable around a vulnerable user.

04

What We Built

A safety-first orchestrator
around the SO-101.

01

Detect

SmolVLM watches both cameras at 1.5 Hz. Reports task state, safety state, presence as JSON. Debounced FSM filters out hallucinations — 3 consistent ticks required to commit.

02

Communicate

ElevenLabs voice agent narrates intent — startup question, every state transition, and help requests when a skill fails. Carmelo is never surprised.

03

Defer

Two failed attempts at the same skill → arm returns home, agent asks Carmelo to finish that step, FSM resumes when the VLM detects he’s done.

05

Architecture

VLM as supervisor,
SMALLVLA as executor.

📷

Wrist + Env Cameras

Two USB cameras stream into the orchestrator. Same views the policies were trained on.

↓ 1.5 Hz
🤖

SmolVLM — Scene Observer

One JSON per tick: bread_in_toaster, lever_down, human_hand_visible, confidence. Safe defaults on parse failure.

↓ debounced ×3

FSM + SafetyMonitor

Decides which skill to dispatch. Preflight gate refuses unsafe dispatches. Runtime watchdog can E-STOP an in-flight skill.

🧹

lerobot-record → SO-101

Subprocess wrapper launches ACT/SmolVLA per skill. Heartbeat watchdog kills it if anything goes wrong.

Either side swaps without retraining the other. VLM model, FSM rules, and policies are all config knobs.

06

Cloud TTS. Narrated intent.

The robot tells Carmelo what it’s doing.

🍜 Welcome

At startup, the agent asks: “Carmelo, would you like some toast?” Mic captures yes/no — no answer means no action.

🗣 Narration

Every state transition is announced in a warm voice. “Bread’s in. Now for the lever.” Carmelo isn’t startled by a moving arm.

🙋 Ask for help

After two failed attempts: “Could you give the lever a press for me? I can’t quite reach.” The VLM watches for Carmelo’s help and resumes silently.

ElevenLabs cloud TTS, voice Sarah. Embodied gestures (runmotion.ai) are follow-on work; voice carries the same intents in this build.

07

Safety

Five things can stop the robot.
All of them faster than Carmelo can.

Hand in workspace VLM detects, preflight refuses dispatch · runtime ESTOPs in-flight
Hot toaster Lever-down + bread-insert dispatch → refuse
VLM goes blind 3 consecutive low-confidence ticks → halt and ask
Loop hangs Heartbeat stale > 5s → immediate ESTOP, kill subprocess
Bad JSON Fail-safe defaults — assume hand present, halt.

Preflight gate · Runtime watchdog · Heartbeat · Audit log of every violation

08

Demo

See it run.

Full video: youtu.be/jTSO_XpUEP8

09

Proof

A real failure. Recovered courteously.

tick=42 state=PRESSING skill=lever_down
vlm: {lever_down: false, in_toaster: true}
action: WAIT (skill in flight)

tick=58 state=PRESSING skill=none
attempt 2/2 for lever_down
skill failed twice — asking Carmelo for help
awaiting_help_since=now · dispatch suspended

Policy B doesn’t fully depress the lever. The orchestrator detects it — the FSM keeps trying to leave PRESSING, but lever_down stays false.

[VOICE] “I’m having trouble pressing
  the lever down. Could you help me
  with that part?”

tick=71 vlm: lever_down: true ✓
action: debouncing PRESSING→TOASTING
tick=74 state=TOASTING

Carmelo presses the lever. The VLM detects it on the next tick.
The FSM resumes from TOASTING. No reset, no restart, no lost state.

10

The Stack

Everything we shipped.

On the Robot

SO-101 dual-arm via LeRobot
ACT · SmolVLA policies (HF-hosted)
Wrist + environment cameras
MakerMods ModBlocks bus
lerobot-record subprocess dispatch

Orchestrator (Python)

SmolVLM @ 1.5 Hz scene observer
Debounced FSM (3-tick consistency)
Layered safety: preflight + runtime + heartbeat
Conversational agent (ElevenLabs)
Stub providers for every external dep
Unit tests for FSM + SafetyMonitor

~75
Episodes / policy
3
Reliability tiers
5090
Trained on Brev overnight

Follow-on: embodied gestures (runmotion.ai) · MakerMods Display + Button hardware activation

11

Team Carmelo’s Cucina

Built remotely + locally
over one weekend.

Abhinaya

Data & training

Sameer

Policy & integration

Nico

Hardware & ops

Kumar

Data & training

Allison

Orchestrator & pitch

huggingface.co/ajkoder/smolvla-bread-toaster
huggingface.co/ajkoder/smolvla-toaster-on

12
Carmelo with family

Every elderly person living alone deserves a robot
that knows when to ask for help.

Not generic AI dropped on a kitchen.
A system built for Carmelo.

CARMELO’S CUCINA · MAKERMODS HACKATHON 2026 · ALLISONCOSSETTE/CARMELOS-CUCINA

1 / 12