In a move that should make the entire robotics industry sit up and spit out its coffee, Ant Group—yes, the fintech giant affiliated with Alibaba—has just dropped an entire foundational stack for embodied intelligence on an unsuspecting world. And the best part? It’s all open-source under the astonishingly permissive Apache 2.0 license. This isn’t just another model; it’s a three-piece combo of perception, action, and imagination designed to be the universal brain for the next generation of robots.
While the rest of the world was busy watching humanoid robots do backflips, Ant Group’s Robbyant unit was quietly building the software that will actually make them useful. They’ve released not one, but three interconnected foundation models under the LingBot banner, targeting the core challenges of making robots that can see, act, and even plan ahead in the messy, unpredictable real world. It’s a bold, strategic play that signals a shift from building bespoke robot brains to creating a standardized, Android-like platform for anyone to build upon.
The Three-Course Meal for Embodied AI
Ant Group has structured its release as a complete toolkit for embodied intelligence, covering what it calls perception, action, and imagination. It’s a comprehensive approach that addresses the full pipeline from sensing the world to interacting with it.
First, there’s LingBot-Depth, a model for spatial perception. Then comes LingBot-VLA, a Vision-Language-Action model that translates commands into physical motion. And finally, the pièce de résistance: LingBot-World, an interactive world model that can simulate reality for training and planning. Together, they represent a serious attempt to solve the embodied AI problem from end to end.
LingBot-VLA: A Brain Trained on 2.2 Years of Reality
The headline grabber is LingBot-VLA, and for good reason. It’s been trained on a staggering 20,000 hours of real-world robot data. To put that in perspective, that’s over 2.2 years of a robot continuously performing tasks, learning from its mistakes, and figuring out how the physical world works. This isn’t simulation; it’s hard-won experience.
This massive dataset was collected from nine different popular dual-arm robot configurations, which is critical for generalization. The goal of a VLA is to create a single “universal brain” that can operate different types of robots without expensive retraining for each new piece of hardware. Ant Group claims LingBot-VLA can be adapted to single-arm, dual-arm, and even humanoid platforms, a longstanding challenge in the field.
The results speak for themselves. On the GM-100 real-robot benchmark, LingBot-VLA outperformed competing models, especially when paired with its sibling, LingBot-Depth, to improve spatial awareness. It also demonstrated training speeds 1.5 to 2.8 times faster than existing frameworks, a crucial factor for developers on a budget.
A Mind’s Eye and a Digital Sandbox
Perceiving the world is half the battle, and that’s where LingBot-Depth comes in. It’s a foundation model designed to generate metric-accurate 3D perception from noisy, incomplete, and sparse sensor data. It can apparently work with less than 5% of the depth information available, a scenario all too common when dealing with reflective surfaces or transparent objects that confound standard sensors. This is the kind of robust perception needed for a robot to function outside of a pristine lab.
But the most mind-bending part of this release is LingBot-World. It’s an interactive world model that functions as a “digital sandbox” for AI. It can generate nearly 10 minutes of stable, controllable, physics-grounded simulation in real-time. This directly tackles the “long-term drift” problem that plagues most video generation models, where scenes devolve into a surrealist nightmare after a few seconds.
Even more impressively, LingBot-World is interactive. It runs at around 16 frames per second with less than a second of latency, allowing users to control characters or change the environment with text prompts and see instant feedback. It also features zero-shot generalization: feed it a single photo of a real place, and it can generate a fully interactive world from it without any scene-specific training.
The Android Strategy for Robotics
So, why is a fintech company pouring resources into building free robot brains? The answer lies with its affiliate, Alibaba. As an e-commerce and logistics titan, Alibaba stands to benefit enormously from widespread, cheap, and intelligent automation. By open-sourcing the foundational layer under a permissive Apache 2.0 license, Ant Group is inviting the entire world to build the next generation of robotics on its platform. It’s a classic ecosystem play.
This release on Hugging Face isn’t just a data dump; it includes a full, production-ready codebase with tools for data processing, fine-tuning, and evaluation. Ant Group isn’t just giving away a fish; it’s giving away the entire fishing fleet and the schematics to build more.
While competitors have their own impressive models, many are kept behind closed APIs or restrictive licenses. Ant Group’s decision to go fully open and commercially friendly could be the catalyst that unlocks a Cambrian explosion of innovation in robotics. The race is no longer just about who has the smartest AI, but who can build the most vibrant and productive ecosystem around it. With the LingBot trilogy, Ant Group has just made a powerful opening move.













