NIST Unveils New Benchmark to See if Humanoid Robots Are Actually Useful

The U.S. National Institute of Standards and Technology (NIST) has decided it’s time to find out if the current crop of shiny humanoid robots can do more than just star in slick marketing videos. The agency has proposed a new “Baseline Performance Benchmark,” a standardized obstacle course designed to measure the real-world capabilities of humanoids, nearly a decade after the DARPA Robotics Challenge (DRC) last put these machines to a serious, humbling test.

Back in 2013-2014, the DRC gave us a treasure trove of robotic bloopers and a stark reminder of how hard tasks like opening a door actually are. NIST, which designed those original tests, is now proposing a modern equivalent. The goal is to establish a common set of quantifiable tasks that any self-respecting commercial humanoid should be able to perform. The proposed tests cover four key areas: Mobility (stairs, ramps), Manipulation (turning knobs, using tools), Loco-manipulation (carrying a tote through a doorway), and Cognition (multi-step task planning).

Task list for the proposed NIST humanoid robot benchmark

NIST is developing the test apparatus in collaboration with the industry and plans to distribute a limited number of the physical testbeds for free to participating U.S. robot manufacturers. The agency is actively seeking input from the robotics community on the test design, essentially asking companies like Boston Dynamics, Figure AI, and Tesla to help build the very yardstick they will be measured against.

Why is this important?

For years, the robotics industry has been dominated by carefully curated demo videos that showcase flawless performance under perfect conditions. There is no standardized way to compare the capabilities of one company’s robot to another’s, leaving customers and investors to guess who has substance and who just has a great video editor. This NIST benchmark could finally cut through the hype.

By creating a common set of repeatable, measurable tasks, NIST is providing a level playing field. It will allow for a direct, apples-to-apples comparison of robot performance, separating the truly capable machines from the lab-bound prototypes. For an industry on the cusp of commercial deployment, this kind of objective validation is not just useful—it’s essential for building trust and steering genuine progress. You can find more details in the official proposal.