Sunday AI Skips Robot Puppets, Teaches Chores by Hand

The dirty secret of modern robotics is that most impressive demos are just high-tech puppet shows. An army of human operators, strapped into complex and costly teleoperation rigs, remotely guide a robot’s every move to generate the data needed to teach it anything useful. It’s a slow, expensive, and frankly, unscalable process. Stanford PhD dropouts Tony Zhao and Cheng Chi of Sunday AI looked at this “scaling deadlock” and decided to skip it entirely.

Their solution, powering a new foundation model called ACT-1, is deceptively simple: if you want a robot to learn a task, just do it yourself. Instead of a $20,000 teleop rig, Sunday’s engineers use a $200 “Skill Capture Glove.” This glove, co-designed to match the geometry and sensors of their Memo robot’s hand, captures the subtle, contact-rich data of human motion. The premise is audacious: if a human can do it wearing the glove, the robot can learn it, no puppeteering required.

The Data Bottleneck and the Glove Solution

Sunday’s core belief is that robotics isn’t held back by hardware, compute, or funding, but by a single, definitive constraint: data. While Large Language Models could ingest the entire internet, robotics has no such corpus of real-world interaction data. Companies like Tesla can leverage millions of cars for data collection, but robotics startups don’t have that luxury. Teleoperation was the industry’s answer, but it’s a brute-force approach that is both capital-intensive and slow.

The Skill Capture Glove by Sunday AI, which mirrors the Memo robot's hand.

The Skill Capture Glove is Sunday’s elegant end-run around this problem. By decentralizing data collection, anyone, anywhere, can contribute to the training set without needing a physical robot present. This provides two key advantages:

  • Capital Efficiency: Sunday claims the glove is two orders of magnitude cheaper than a standard teleop setup, dramatically lowering the cost of data acquisition.
  • Data Quality: For tasks that rely on feel—like determining the force needed to fold a sock or seat a wine glass in a dishwasher rack—the glove provides natural force feedback that remote teleoperation simply can’t replicate.

This approach allows Sunday to capture data from hundreds of messy, real-world homes, building a dataset that reflects the “long tail of living,” as they put it—cats in dishwashers and all.

From the Dining Table to the Dishwasher

To prove ACT-1’s mettle, Sunday showcased what it calls “the most complex task ever done by a robot autonomously”: clearing a dinner table and loading a dishwasher. This isn’t just picking and placing. The task involves 33 unique and 68 total dexterous interactions with 21 different objects—from delicate, transparent wine glasses to ceramic plates and metal utensils.

Throughout the long-horizon task, the Memo robot navigates over 130 feet, dumps food waste, and even operates the dishwasher. It’s a symphony of fine-grained manipulation and room-scale navigation controlled by a single end-to-end model. Co-founder Tony Zhao admits they shattered plenty of glasses during development but managed zero breaks over more than 20 live demos, a testament to the model’s learned sensitivity.

Zero-Shot Generalization in the Wild

A robot that only works in its own lab is just a science project. To prove ACT-1’s adaptability, the team deployed Memo in six unfamiliar Airbnbs. The goal: clear the table and load the dishwasher with zero environment-specific training.

Sunday AI's Memo robot performing tasks in a real-world home environment.

By conditioning the model on 3D maps during training, ACT-1 learns to interpret new layouts rather than memorizing specific ones. When dropped into a new house, it uses the provided map to navigate to key locations, demonstrating a crucial capability for any robot intended for the chaos of a real home. To date, ACT-1 is the first foundation model to combine this level of long-horizon manipulation with map-conditioned navigation.

Pushing the Frontiers of Dexterity

Beyond the marathon dishwasher task, Sunday is also showing off ACT-1’s finesse with two notoriously difficult challenges: folding socks and pulling an espresso shot. While other robots have folded large, predictable items, socks are a nightmare of deformability and self-occlusion. ACT-1 successfully identifies pairs from a cluttered pile, balls them using multi-finger movements, and deposits them in a basket.

Operating an espresso machine, meanwhile, demonstrates a combination of millimeter-level precision and brute force. The robot performs a mid-air tamp, inserts the portafilter, and generates the high torque needed to lock it in before pressing the button. These aren’t just flashy demos; they’re carefully chosen proofs of the high-quality, nuanced data the Skill Capture Glove can provide.

Sunday’s approach is a bold gamble. By betting everything on a novel data collection method, it has bypassed the industry’s biggest bottleneck and produced a model with startling capabilities. The wheeled Memo robot may not have the sci-fi appeal of a bipedal humanoid, but its practical intelligence is undeniable. Sunday has quietly thrown down a gauntlet, suggesting that the future of robotics may not be built by puppeteers, but by simply showing a robot how it’s done.