NVIDIA ENPIRE: AI agents run robot research labs

For years, the dream of recursive AI—machines that can sharpen their own digital wits—has been largely a sandbox affair, confined to the sterile safety of simulations. It is one thing for an AI to master a video game; it is quite another to let it loose on expensive hardware in the messy, unpredictable reality of a physical lab. Now, researchers at NVIDIA, in collaboration with Carnegie Mellon University and UC Berkeley, have effectively handed over the keys to the workshop. Their new framework, ENPIRE, establishes what is essentially a self-governing robot research programme, and the results are as impressive as they are slightly bruising for the egos of human robotics engineers.

ENPIRE allows “agentic” AI—coding agents capable of autonomous reasoning and action—to take the wheel of the physically embodied learning process. The system achieved a staggering 99% success rate on fiddly, dexterous tasks that would typically require weeks of human-led trial and error. We’re talking about inserting pins into a box, seating a GPU into a motherboard, and even using a tool to snip a cable tie. This isn’t merely about tweaking a few settings; these AI agents are rewriting their own algorithms based on real-world feedback, effectively outsourcing the entire R&D cycle to themselves.

The Automated Feedback Loop

The primary bottleneck in robotics has long been the labour-intensive nature of human supervision and algorithmic engineering. ENPIRE bypasses this by creating a closed, repeatable feedback loop that an AI can manage from start to finish. The framework is split into four elegant modules that give it its name:

Environment (EN): This module automates the most tedious aspects of physical testing: resetting the scene for the next attempt and verifying the result. Before the AI even attempts the main task, a separate agent figures out how to automatically reset the workspace—recognising the clever insight that resetting is often a simpler robotics problem than the task itself.
Policy Improvement (PI): This is where the AI agents get to work. They propose and implement a variety of strategies to improve performance, ranging from simple heuristics to sophisticated methods like behaviour cloning or reinforcement learning (RL).
Rollout (R): This is where the silicon meets the steel. The module executes the agent’s proposed policy on one or more physical robots, gathering vital real-world data.
Evolution (E): The AI agents scrutinise the logs from the rollouts, pore over scientific literature for fresh ideas, and then refine the code for the next iteration. It is a relentless, automated version of the scientific method, running 24/7.

This structure converts the chaotic process of real-world robot learning into a streamlined, controllable optimisation problem that requires almost zero human intervention once the initial kit is set up.

A diagram showing the ENPIRE framework's architecture and real-world task examples.

From Intern to Principal Investigator

What marks ENPIRE as a genuine leap forward is the level of autonomy granted to the AI. This is what NVIDIA researcher Jim Fan describes as “real autoresearch.” The agents aren’t just fiddling with the knobs of a pre-written algorithm; they are actively exploring different programming paradigms, rewriting their own training objectives, and even modifying their data loaders.

In one notable instance, while learning a pin-insertion task, an agent independently decided that fine-tuning RL parameters was a dead end. Instead, it wrote its own contact-force safety controller from scratch, which proved to be a far more effective solution. It’s the AI equivalent of a research intern promoting themselves to lead scientist and then solving a problem that had the senior staff stumped.

The project’s “hillclimb timeline” illustrates this beautifully, showing how various agent-proposed ideas—such as adding regularisation or compensating the controller—incrementally nudged the success rate toward that near-perfect 99% mark in a matter of hours.

Scaling the Robotic Workforce

ENPIRE is built to scale. The framework can manage an entire fleet of robots operating in parallel, massively accelerating the learning curve. To measure the efficiency of this multi-robot, multi-agent system, the researchers introduced two new metrics: Mean Robot Utilisation (MRU) and Mean Token Utilisation (MTU). These track how effectively the system keeps the hardware busy and how efficiently it spends its AI model’s computational budget.

The implications of this research are profound. By automating the physical feedback loop, the bottleneck in robotics could shift from the painstaking design of algorithms to the design of self-contained, auto-resetting environments for AI agents to conquer.

NVIDIA has announced plans to open-source the entire ENPIRE framework, a move that could democratise access to high-end robotics research. Before long, anyone with a robotic arm and a decent GPU might be able to run their own self-improving lab at home. The era of AI teaching itself in the real world is no longer a simulation—it’s live, it’s snipping cable ties, and it’s rewriting its own code for the job.

You can dive deeper into the technical details by reading the full paper. Hyperlink: Read the paper on the NVIDIA Research page.

The Automated Feedback Loop

From Intern to Principal Investigator

Scaling the Robotic Workforce

Send us a correction or suggestion