For years, robotics has been a story of brilliant hardware waiting for a brain. We’ve seen mechanical dogs do backflips and factory arms perform with hypnotic precision, but they were mostly just repeating a script. Ask them to do something new, and you’d be met with the silent, metallic equivalent of a blank stare. That era, it seems, is grinding to a screechy, unceremonious halt.
Enter the new class of robots from Google DeepMind, which are less pre-programmed automatons and more… thoughtful collaborators. In a recent tour of its California lab, the company showcased a fleet of machines that don’t just see and do; they understand, plan, and even think before they act. The secret sauce isn’t better gears or motors, but the infusion of the same powerful AI that fuels its Gemini models. The result is robots that can pack your lunch with unnerving dexterity and then amusingly, literally, refuse to do it as Batman.
The Two-Part Brain Behind the Brawn
The fundamental shift, as explained by Keshkaro, Director of Robotics at Google DeepMind, is building robots on top of large Vision-Language-Action (VLA) models. Instead of being programmed for one specific task, these robots are given a general understanding of the world. They leverage the vast knowledge embedded in models like Gemini to comprehend concepts, objects, and instructions in a way that was previously science fiction.
Google’s architecture effectively gives the robot a two-part brain:
- Gemini Robotics-ER (Embodied Reasoning): This is the strategic planner. When given a complex, long-horizon task—like “clean up this table according to local recycling rules”—this model acts as the high-level brain. It can even use tools like Google Search to look up the necessary information before creating a step-by-step plan.
- Gemini Robotics VLA (Vision-Language-Action): This is the executor. It takes the simple, sequential instructions from the reasoning model and translates them into the precise motor commands needed to perform the physical action.
This division of labor allows the robots to move beyond simple, short-horizon actions like “pick up the block” and tackle multi-step, complex goals that require genuine problem-solving.
Thinking Makes It So
Perhaps the most fascinating breakthrough is the application of “chain of thought” reasoning to physical actions. We’ve seen this in language models, where asking an AI to “think step-by-step” improves its output. DeepMind has now given its robots an “inner monologue.” Before a robot moves, it generates a sequence of its reasoning in natural language.
“We’re making the robot think about the action that it’s about to take before it takes it,” Keshkaro explains in the video tour. “Just this act of outputting its thoughts makes it more general and more performant.”
This isn’t just an academic exercise. Forcing the robot to articulate its plan—“Okay, I need to pick up the bread and place it gently inside the tiny Ziploc bag opening”—helps it structure complex actions that humans perform intuitively. It’s a bizarre but effective emergent property: to make a robot better at physical tasks, you first teach it to talk to itself.
Lunch Is Served… Eventually
The proof, as they say, is in the pudding—or in this case, the packed lunch. One of the most compelling demos involved an Aloha robot arm tasked with preparing a lunchbox. This is a task requiring what the team calls “millimeter-level precision,” especially when dealing with a flimsy Ziploc bag.
Watching the robot work is a masterclass in the current state of the art. It’s incredibly impressive, yet charmingly imperfect. The robot deftly pinches the bag open, carefully places a sandwich inside, and then adds a chocolate bar and grapes. It fumbles slightly, corrects itself, and keeps trying—a far cry from the brittle, error-prone robots of just a few years ago that, as host Hannah Fry recalled, mostly just made piles of broken Lego. This dexterity is learned not from rigid code, but from human demonstration via teleoperation, where an operator “embodies” the robot to teach it the correct movements.
“I Cannot Perform Actions as a Specific Character”
While one demo showcased dexterity, another highlighted the system’s generalization and its amusingly literal interpretation of language. When asked to “put the green block in the orange tray, but do it as Batman would,” the robot paused.
Its response, delivered in a deadpan robotic voice, was priceless: “I cannot perform actions as a specific character. However, I can put the green block in the orange tray for you.”
The exchange perfectly captures the power and current limitations of these systems. The robot understood the core instruction perfectly and discarded the nonsensical, stylistic flourish. It has a world-class understanding of actions and objects, but zero grasp of cultural personas. It’s a general-purpose robot, not a method actor.
This peek inside DeepMind’s lab reveals that the field of robotics is finally getting its “software” moment. By leveraging the monumental advances in large-scale AI, Google is creating a platform for robots that can learn, adapt, and reason in the real world. They may not be ready to impersonate superheroes, but they’re already packing our lunches. And for anyone who’s ever rushed out the door in the morning, that might be the most heroic feat of all.






