Independent living isn’t about removing him from his kitchen.
It’s about handling the steps that have become hard — and asking for help on the rest.
The Problem
Mid-skill failure = silent stop
Hot toaster contact = no awareness
Hand in workspace = no detection
Partial press of lever = retry forever
Mid-skill failure = verify, retry once, then ask
Hot toaster contact = preflight refuses dispatch
Hand in workspace = halts, narrates pause
Partial press = “Could you finish that part?”
Why this matters here
“AI automations are for convenience only,
not safety or security-critical use cases.”
— makermods.ai/modblocks
Carmelo’s Cucina is the safety layer that turns the MakerMods convenience stack
into something deployable around a vulnerable user.
What We Built
SmolVLM watches both cameras at 1.5 Hz. Reports task state, safety state, presence as JSON. Debounced FSM filters out hallucinations — 3 consistent ticks required to commit.
ElevenLabs voice agent narrates intent — startup question, every state transition, and help requests when a skill fails. Carmelo is never surprised.
Two failed attempts at the same skill → arm returns home, agent asks Carmelo to finish that step, FSM resumes when the VLM detects he’s done.
Architecture
Two USB cameras stream into the orchestrator. Same views the policies were trained on.
One JSON per tick: bread_in_toaster, lever_down, human_hand_visible, confidence. Safe defaults on parse failure.
Decides which skill to dispatch. Preflight gate refuses unsafe dispatches. Runtime watchdog can E-STOP an in-flight skill.
Subprocess wrapper launches ACT/SmolVLA per skill. Heartbeat watchdog kills it if anything goes wrong.
Either side swaps without retraining the other. VLM model, FSM rules, and policies are all config knobs.
Cloud TTS. Narrated intent.
At startup, the agent asks: “Carmelo, would you like some toast?” Mic captures yes/no — no answer means no action.
Every state transition is announced in a warm voice. “Bread’s in. Now for the lever.” Carmelo isn’t startled by a moving arm.
After two failed attempts: “Could you give the lever a press for me? I can’t quite reach.” The VLM watches for Carmelo’s help and resumes silently.
ElevenLabs cloud TTS, voice Sarah. Embodied gestures (runmotion.ai) are follow-on work; voice carries the same intents in this build.
Safety
Preflight gate · Runtime watchdog · Heartbeat · Audit log of every violation
Proof
Policy B doesn’t fully depress the lever. The orchestrator detects it — the FSM keeps trying to leave PRESSING, but lever_down stays false.
Carmelo presses the lever. The VLM detects it on the next tick.
The FSM resumes from TOASTING. No reset, no restart, no lost state.
The Stack
SO-101 dual-arm via LeRobot
ACT · SmolVLA policies (HF-hosted)
Wrist + environment cameras
MakerMods ModBlocks bus
lerobot-record subprocess dispatch
SmolVLM @ 1.5 Hz scene observer
Debounced FSM (3-tick consistency)
Layered safety: preflight + runtime + heartbeat
Conversational agent (ElevenLabs)
Stub providers for every external dep
Unit tests for FSM + SafetyMonitor
Follow-on: embodied gestures (runmotion.ai) · MakerMods Display + Button hardware activation
Team Carmelo’s Cucina
Data & training
Policy & integration
Hardware & ops
Data & training
Orchestrator & pitch
huggingface.co/ajkoder/smolvla-bread-toaster
huggingface.co/ajkoder/smolvla-toaster-on
Every elderly person living alone deserves a robot
that knows when to ask for help.
Not generic AI dropped on a kitchen.
A system built for Carmelo.
CARMELO’S CUCINA · MAKERMODS HACKATHON 2026 · ALLISONCOSSETTE/CARMELOS-CUCINA