Humanoid Robot Benchmarks: What Buyers Should Ask

That is why two recent benchmark efforts matter. NIST's Humanoid Robot Baseline Performance Benchmark is aimed at minimum expected physical capabilities for commercially available humanoids. Fraunhofer IPA's humanoid benchmark goes further into application criteria such as functional safety, cybersecurity, cleanroom suitability, and energy efficiency. Both are written for a market that has outgrown edited highlight reels.

For home buyers, the useful question is not "which benchmark will crown the winner?" It is: which benchmark questions should you demand before letting a bipedal robot walk near furniture, pets, children, stairs, glass doors, and expensive stuff?

Humanoid robot benchmark stack showing baseline tasks, application fit, and buyer proof — Scroll sideways to inspect the full chart.

The short version: do not buy a home humanoid because it can do one chore once. Buy only when the maker can show repeatable mobility, repeatable manipulation, measured contact forces, real battery behavior, and a clear failure mode for when the robot gets confused.

Why Benchmarks Suddenly Matter

NIST says the last period when humanoid performance was rigorously measured across robots was the DARPA Robotics Challenge era in 2013 and 2014. That matters because the market in 2026 is no longer just university labs. Buyers can already find real products, preorders, rentals, and enterprise humanoids with home-style promises attached.

In the ui44 database, Unitree G1 is the clearest example of a humanoid you can actually point to as available. It starts at $13,500, stands 132 cm, weighs 35 kg, and lists about 2 hours of battery life. 1X NEO is a $20,000 preorder at 167 cm, 30 kg, and roughly 4 hours of battery. AGIBOT Expedition A3 is listed around $45,000, stands 173 cm, weighs 55 kg, and claims up to 10 hours with a dual hot-swappable battery system.

Those specs are useful. They are not enough. A spec sheet can tell you whether a robot is small enough for a hallway or heavy enough to be a concern. It cannot tell you whether the robot can recover from a bad grasp, stop before a collision becomes painful, or repeat a task on a Tuesday after a software update.

Home humanoid robot benchmark scorecard mapping lab tests to buyer questions — Scroll sideways to inspect the full chart.

What NIST Is Trying To Standardize

NIST's proposal is important because it starts with a modest but powerful idea: establish a low-footprint baseline for locomotion and manipulation. The point is not to prove that a humanoid can run a perfect household. It is to make basic physical capability measurable across robots.

The proposed task set exercises four areas that translate directly to homes:

Benchmark area

Mobility and manipulation

Home buyer translation: Can the robot move and use its hands without needing a custom stage?

Benchmark area

Loco-manipulation

Home buyer translation: Can it carry, reach, and move at the same time?

Benchmark area

Whole-body awareness

Home buyer translation: Does it understand where its limbs and torso are in tight spaces?

Benchmark area

Minimal reasoning and scene understanding

Home buyer translation: Can it choose a sensible action when the scene is not exactly like the demo?

Benchmark area	Home buyer translation
Mobility and manipulation	Can the robot move and use its hands without needing a custom stage?
Loco-manipulation	Can it carry, reach, and move at the same time?
Whole-body awareness	Does it understand where its limbs and torso are in tight spaces?
Minimal reasoning and scene understanding	Can it choose a sensible action when the scene is not exactly like the demo?

The word "baseline" is doing a lot of work here. A home robot buyer should not expect NIST's first baseline to answer every consumer question. It will not tell you whether the robot is kind to a nervous dog or whether it can put away your exact mugs. But a baseline can make weak claims easier to spot. If a humanoid cannot pass a public low-footprint mobility and manipulation test, it has no business being marketed as a general-purpose home helper.

The most useful buyer demand is simple: show the score, not just the clip. A clip shows what happened once. A score should show how often it happened, under what setup, and what counted as a failure.

What Fraunhofer Adds: Safety, Security, And Runtime

Fraunhofer IPA's benchmark is more application-focused. It groups tests into technologies and basic abilities, complex abilities, cleanroom suitability, functional safety, cybersecurity, and energy efficiency. For homes, three of those categories deserve special attention: contact safety, remote-control security, and runtime under motion.

The Fraunhofer example is concrete because the institute applied the benchmark to a Unitree G1 EDU-4 delivered in May 2025 with Dex3-1 three-finger hands and firmware version 1.04. The results were not a generic thumbs-up or thumbs-down. They exposed the kind of detail buyers usually never see.

Fraunhofer reported collision forces over 500 N in some cases, a critical Bluetooth remote-control vulnerability in the software version tested, and maximum battery times of 2 hours 49 minutes while standing and 1 hour 49 minutes in a stand-walk scenario. Fraunhofer says the Bluetooth issue has since been fixed, which is exactly the point: benchmarking is useful because it finds problems early enough for vendors to fix them.

For a home buyer, those numbers change the conversation. A robot that weighs 35 kg and can generate high collision forces is not just a gadget with legs. It is moving machinery in a shared space. A robot with a remote-control weakness is not just a privacy issue. It is a physical safety issue. A robot that lasts under two hours while standing and walking is not ready to be treated like an all-day household worker unless it can dock, pause, or hand off safely.

The Evidence Ladder Buyers Should Use

Not all proof is equal. A five-second clip, a shipping page, an internal test, a public benchmark, and an independent lab result should not carry the same weight.

Humanoid robot evidence ladder from demo clips to independent benchmark results — Scroll sideways to inspect the full chart.

Use this ladder when comparing humanoids:

Edited demo: Good for curiosity, weak for purchase decisions.
Shipping spec: Better, because price, size, weight, payload, and battery can be compared.
Repeated task evidence: Stronger, especially if failures are counted.
Standardized benchmark: Stronger again, because the method can be repeated.
Independent benchmark: Best, because neutral testing reduces the chance that the environment was designed around one robot's strengths.

This is why Figure 02 and UBTECH Walker S2 should be evaluated differently from a home preorder. Figure 02 has industrial proof and a large 20 kg payload, but it is not publicly priced and is now marked discontinued in the ui44 database. Walker S2 has a 15 kg payload and autonomous battery swapping aimed at continuous enterprise operation, but no public consumer price. Those facts are meaningful, yet they do not prove a robot can safely work in a home.

How This Changes The Robot Comparison

Benchmarks force a buyer to ask different questions. "Which humanoid is cheapest?" becomes "which humanoid can prove safe contact, repeatable manipulation, and predictable failures at its price?"

Robot

Unitree G1

ui44 database signal: Available, $13,500, 132 cm, 35 kg, about 2 hours battery
Benchmark question buyers should ask: What are the measured contact forces, fall behaviors, and manipulation success rates?

Robot

1X NEO

ui44 database signal: Preorder, $20,000, 167 cm, 30 kg, about 4 hours battery
Benchmark question buyers should ask: Which home tasks are autonomous, which are supervised, and how are failures reported?

Robot

AGIBOT Expedition A3

ui44 database signal: Active, about $45,000, 173 cm, 55 kg, up to 10 hours battery
Benchmark question buyers should ask: Which public tasks beat baselines on real robots, not only in simulation?

Robot

Figure 02

ui44 database signal: Industrial model, 168 cm, 70 kg, 20 kg payload, not publicly priced
Benchmark question buyers should ask: Which factory capabilities transfer to unstructured homes?

Robot

UBTECH Walker S2

ui44 database signal: Active, 15 kg payload, autonomous battery swapping, no public price
Benchmark question buyers should ask: Can continuous-duty autonomy stay safe around non-experts?

Robot	ui44 database signal	Benchmark question buyers should ask
Unitree G1	Available, $13,500, 132 cm, 35 kg, about 2 hours battery	What are the measured contact forces, fall behaviors, and manipulation success rates?
1X NEO	Preorder, $20,000, 167 cm, 30 kg, about 4 hours battery	Which home tasks are autonomous, which are supervised, and how are failures reported?
AGIBOT Expedition A3	Active, about $45,000, 173 cm, 55 kg, up to 10 hours battery	Which public tasks beat baselines on real robots, not only in simulation?
Figure 02	Industrial model, 168 cm, 70 kg, 20 kg payload, not publicly priced	Which factory capabilities transfer to unstructured homes?
UBTECH Walker S2	Active, 15 kg payload, autonomous battery swapping, no public price	Can continuous-duty autonomy stay safe around non-experts?

The table is deliberately skeptical. A robot can be impressive and still be a bad home purchase. A low price can make experimentation possible, but it does not remove the need for safety data. A long battery claim can sound useful, but it does not prove that the robot can work for hours while manipulating real objects.

Where Competitions Fit In

Benchmarks are not only coming from standards labs. AGIBOT's World Challenge 2026, announced alongside ICRA 2026, drew 526 teams from 27 countries across Reasoning-to-Action and World Model tracks. The useful signal is not the trophy. It is the shift from simulation-only scores toward real-robot finals, standardized metrics, task planning, physical execution, and failure cases such as object drops and grasping problems.

That kind of competition is still not the same as a home safety certification. But it can reveal whether the software stack is improving on the messy middle: language understanding, spatial reasoning, disturbance adaptation, long-horizon task reliability, and mobile manipulation under physical constraints.

For buyers, competitions should be treated as supporting evidence. If a company says its model did well in a real-robot benchmark, ask what robot body was used, what tasks were run, how many attempts were allowed, and whether the task looked anything like a home chore.

What should home buyers ask before trusting a humanoid robot?

Before putting a deposit on a humanoid robot, ask for evidence in five areas.

Mobility: Can it walk on ordinary flooring, turn in a hallway, handle thresholds, and stop safely near stairs? A home robot that only walks on open showroom floors has not proved enough.

Manipulation: Can it pick, carry, place, and release common objects with measured success rates? Hands and arms matter more than dance moves. A useful home robot needs to handle light, awkward, and fragile things without relying on perfect placement.

Contact safety: What forces were measured during collisions? What happens if a person bumps the robot, a pet crosses its path, or a child grabs an arm? Fraunhofer's G1 test is a reminder that contact force needs numbers, not adjectives.

Security: How are remote control, Bluetooth, Wi-Fi, updates, logs, and cloud access protected? A vulnerability on a screen is bad. A vulnerability on a moving robot is worse.

Runtime and recovery: How long does the robot run while moving and manipulating, not just standing? Can it dock itself, swap batteries, pause safely, and recover after a failed grasp?

If a seller cannot answer these questions, the honest conclusion is not "the robot is fake." It is "the robot has not yet earned home trust."

What To Watch Next

The most useful future benchmark will combine NIST-style baseline tasks with Fraunhofer-style safety, security, and energy tests. For homes, it should also add plain-language reporting: number of attempts, success rate, maximum contact force, time to complete, human interventions, battery used, and the exact environment.

That would let buyers compare a $13,500 Unitree G1, a $20,000 1X NEO preorder, a $45,000 AGIBOT platform, and enterprise humanoids on something more meaningful than edited video. It would also help separate two very different claims: "this robot can do a task once" and "this robot can be trusted to repeat that task around people."

Until then, use benchmarks as a filter. The best home humanoid is not the one with the most cinematic demo. It is the one whose maker can say, in public, how it was tested, how often it failed, how hard it hit, how long it ran, and what it did when the task went wrong.

That is the point of benchmark culture. It turns "look what our robot can do" into "here is what our robot can prove."

Related in the database

Use this article as a privacy verification workflow

Turn the article into a privacy verification pass grounded in the robots, manufacturers, and components it actually references.

Humanoid Robot Benchmarks: What Buyers Should Ask already points you toward 5 linked robots, 5 manufacturers, and 3 countries inside the ui44 database. That matters because strong buyer guidance is easier to apply when you can move immediately from a claim or warning into concrete product pages, manufacturer directories, component explainers, and country-level context instead of treating the article as an isolated opinion piece. The fastest next step is to turn the article into a shortlist workflow: open the linked robot pages, verify which specs are actually published for those models, then compare the surrounding manufacturer and component context before you decide whether the underlying claim changes your buying plan.

For this topic, the useful discipline is to separate the editorial lesson from the catalog evidence. The article gives you the framing, but the robot pages tell you what each product actually ships with today: sensor stack, connectivity methods, listed price, release timing, category, and support-relevant compatibility notes. The manufacturer pages then show whether you are looking at a one-off launch, a broader lineup pattern, or a company that spans multiple categories. That layered workflow reduces the risk of buying on a single marketing phrase or a single support FAQ.

Use the robot pages to confirm which products actually expose cameras, microphones, Wi-Fi, or voice systems, then use the manufacturer pages to decide how much of the privacy question seems product-specific versus brand-wide. On this route cluster, G1, NEO, and Expedition A3 form the fastest reality check. If you want a quick working shortlist, open Compare G1, NEO, and Expedition A3 next, then keep this article open as the reasoning layer while you compare structured data side by side.

Practical Takeaway

Every robot, manufacturer, category, component, and country reference below resolves to a real ui44 page, keeping the follow-up path grounded in database records rather than generic advice.

Suggested next steps in ui44

Open G1 and note the listed sensors, connectivity methods, and voice stack before you interpret any policy claim.
Cross-check the wider brand context on Unitree so you can see whether the privacy question touches one model or a broader lineup.
Use the linked component pages to confirm how common the relevant sensors and connectivity layers are across the database.
Keep a short note of which policy layers you checked, which device features are actually present on the robot page, and which items still depend on region- or app-level confirmation.
Finish with Compare G1, NEO, and Expedition A3 so the policy reading sits next to structured product data.

Robot profiles worth opening next

Use the linked product pages as the evidence layer

The linked robot pages are where this article becomes operational. Instead of asking whether the headline is interesting, use the robot entries to inspect the actual mix of sensors, connectivity options, batteries, pricing, release timing, and stated capabilities attached to the products mentioned in the article. That is the easiest way to see whether the warning or opportunity described here affects one product family, a specific design pattern, or an entire buying lane.

G1

Unitree · Humanoid · Available

$13,500

G1 is tracked on ui44 as a available humanoid robot from Unitree. The database currently records a listed price of $13,500, a release date of 2024-05-13, ~2 hours battery life, Not disclosed charging time, and a published stack that includes Depth Camera, 3D LiDAR, and 4 Microphone Array plus Wi-Fi 6 and Bluetooth 5.2.

For privacy-focused reading, this page matters because it shows the concrete device surface behind the policy discussion. Use it to verify whether G1 combines sensors and connectivity in a way that could change the in-home data footprint, and compare the listed capabilities such as Bipedal Walking, Object Manipulation, and Dexterous Hands (optional Dex3-1) with any cloud, app, or voice layers.

NEO

1X Technologies · Humanoid · Pre-order

$20,000

NEO is tracked on ui44 as a pre-order humanoid robot from 1X Technologies. The database currently records a listed price of $20,000, a release date of 2025-10-28, ~4 hours battery life, Not disclosed charging time, and a published stack that includes RGB Cameras, Depth Sensors, and Tactile Skin plus Wi-Fi and Bluetooth.

For privacy-focused reading, this page matters because it shows the concrete device surface behind the policy discussion. Use it to verify whether NEO combines sensors and connectivity in a way that could change the in-home data footprint, and compare the listed capabilities such as Household Chores, Tidying Up, and Safe Human Interaction with any cloud, app, or voice layers.

Expedition A3

AGIBOT · Humanoid · Active

Price TBA

Expedition A3 is tracked on ui44 as a active humanoid robot from AGIBOT. The database currently records a listed price of Price TBA, a release date of 2026-04, Up to 10 hours (dual 1,152 Wh hot-swappable battery system) battery life, ≤2 hours; supports direct charging and battery swapping; 10-second battery swap per product page charging time, and a published stack that includes RGB binocular camera, GPS, and UWB positioning plus Wi-Fi and 4G/5G mobile network.

For privacy-focused reading, this page matters because it shows the concrete device surface behind the policy discussion. Use it to verify whether Expedition A3 combines sensors and connectivity in a way that could change the in-home data footprint, and compare the listed capabilities such as Bipedal Walking & Running, Aerial Kicks & Dynamic Maneuvers, and 31-57 DOF Whole-Body Articulation Depending on Configuration with any cloud, app, or voice layers.

Figure 02

Figure AI · Humanoid · Discontinued

Price TBA

Figure 02 is tracked on ui44 as a discontinued humanoid robot from Figure AI. The database currently records a listed price of Price TBA, a release date of 2024-08-06, Not disclosed (50% greater capacity than Figure 01) battery life, Not disclosed charging time, and a published stack that includes 6 RGB Cameras, Onboard Vision Language Model, and Microphones plus Wi-Fi and Bluetooth.

For privacy-focused reading, this page matters because it shows the concrete device surface behind the policy discussion. Use it to verify whether Figure 02 combines sensors and connectivity in a way that could change the in-home data footprint, and compare the listed capabilities such as Autonomous Task Execution, Speech-to-Speech Conversation, and Pick and Place with any cloud, app, or voice layers, including OpenAI Custom Model.

Walker S2

UBTECH · Humanoid · Active

Price TBA

Walker S2 is tracked on ui44 as a active humanoid robot from UBTECH. The database currently records a listed price of Price TBA, a release date of 2025-07-17, Designed for 24/7 continuous operation with autonomous battery swapping battery life, Autonomous battery swap in about 3 minutes charging time, and a published stack that includes Pure RGB Binocular Stereo Vision System, Stereo Depth Estimation System, and Real-Time Battery Monitoring plus Gigabit Ethernet Port and USB 3.0.

For privacy-focused reading, this page matters because it shows the concrete device surface behind the policy discussion. Use it to verify whether Walker S2 combines sensors and connectivity in a way that could change the in-home data footprint, and compare the listed capabilities such as Autonomous Battery Swapping, 24/7 Continuous Operation, and Industrial Handling and Assembly with any cloud, app, or voice layers.

Manufacturer context behind the article

Check whether this is one product story or a broader company pattern

Manufacturer pages add the privacy context that individual product pages cannot show on their own. They help you check whether cameras, microphones, cloud accounts, app controls, and policy assumptions appear across a broader lineup or stay tied to one specific product story.

Unitree

ui44 currently tracks 2 robots from Unitree across 1 category. The company is grouped under China, and the current catalog footprint on ui44 includes H1, G1.

That wider brand context matters because privacy questions rarely stop at one FAQ page. A manufacturer route helps you see whether the article is centered on one premium model or on a company that has several relevant products and therefore more than one place where the same policy or app assumptions might matter. The category mix here currently points toward Humanoid as the most useful next route if you want to see whether this article reflects a wider pattern inside the brand.

1X Technologies

ui44 currently tracks 2 robots from 1X Technologies across 1 category. The company is grouped under Norway, and the current catalog footprint on ui44 includes NEO, EVE.

AGIBOT

ui44 currently tracks 10 robots from AGIBOT across 3 categorys. The company is grouped under China, and the current catalog footprint on ui44 includes A2 Ultra, X2, Expedition A3.

That wider brand context matters because privacy questions rarely stop at one FAQ page. A manufacturer route helps you see whether the article is centered on one premium model or on a company that has several relevant products and therefore more than one place where the same policy or app assumptions might matter. The category mix here currently points toward Humanoid, Quadruped, Commercial as the most useful next route if you want to see whether this article reflects a wider pattern inside the brand.

Figure AI

ui44 currently tracks 2 robots from Figure AI across 1 category. The company is grouped under USA, and the current catalog footprint on ui44 includes Figure 03, Figure 02.

Broaden the scan without leaving the database

Categories, components, and countries add the wider context

Category framing

Category pages are useful when the article touches a buying pattern that shows up across brands. A category route helps you confirm whether the linked products sit in a narrow niche or whether the same question should be tested across a larger field of alternatives.

Humanoid

The Humanoid category page currently groups 129 tracked robots from 92 manufacturers. ui44 describes this lane as: Full-size bipedal humanoid robots built to work alongside people — from factory floors to household tasks. Compare the cutting edge of humanoid robotics.

That makes the category route a practical follow-up when you want to check whether the products linked in this article are typical for the lane or whether they sit at one edge of the market. Useful starting examples currently include NEO, EVE, Mornine M1.

Country and ecosystem context

Country pages give extra context when support practices, launch sequencing, regulatory posture, or manufacturer mix matter. They are not a substitute for model-level verification, but they do help you see which ecosystems cluster together and which manufacturers sit in the same regional field when you broaden the search beyond the article headline.

China

The China route currently groups 189 tracked robots from 87 manufacturers in ui44. That gives you a useful regional lens when the article points toward support practices, launch sequencing, or brand clusters that may share similar ecosystem assumptions.

On the current route, manufacturers like AGIBOT, Dreame, Unitree Robotics make the page a good way to broaden the scan without losing the regional context that often shapes availability, documentation style, and adjacent alternatives.

Norway

The Norway route currently groups 2 tracked robots from 1 manufacturers in ui44. That gives you a useful regional lens when the article points toward support practices, launch sequencing, or brand clusters that may share similar ecosystem assumptions.

On the current route, manufacturers like 1X Technologies make the page a good way to broaden the scan without losing the regional context that often shapes availability, documentation style, and adjacent alternatives.

USA

The USA route currently groups 89 tracked robots from 69 manufacturers in ui44. That gives you a useful regional lens when the article points toward support practices, launch sequencing, or brand clusters that may share similar ecosystem assumptions.

On the current route, manufacturers like Faraday Future, iRobot, Boston Dynamics make the page a good way to broaden the scan without losing the regional context that often shapes availability, documentation style, and adjacent alternatives.

Questions to answer before you move from reading to buying

A follow-up FAQ built from the entities already linked in this article

Frequently Asked Questions

Which page should I open first after reading “Humanoid Robot Benchmarks: What Buyers Should Ask”?

Start with G1. That gives you a concrete product anchor for the article’s main claim. From there, branch into the manufacturer and component pages so you can tell whether the article is describing one specific model, a repeated brand pattern, or a wider technology issue that affects multiple shortlist options.

How do the manufacturer pages change the buying decision?

Unitree help you zoom out from one article and one product. On ui44 they show lineup breadth, category spread, and the neighboring robots tied to the same company. That context is useful when you are deciding whether a risk belongs to a single model, whether it shows up across a brand’s portfolio, and whether you should keep looking at alternatives before committing.

When should I switch from reading to side-by-side comparison?

Move into Compare G1, NEO, and Expedition A3 as soon as you understand the article’s main warning or promise. The article explains what to watch for, but the compare view is where you can check whether price, status, battery life, connectivity, sensors, and category fit still make the robot a good match for your own home and budget.

Where to go next in ui44

Keep the research chain inside the database

If you want to keep going, these follow-on pages give you the cleanest expansion path from article to research session. Open the comparison route first if you are deciding between products today. Open the manufacturer, category, and component routes if you still need to understand the broader pattern behind the claim.

Compare G1, NEO, and Expedition A3 G1 NEO Expedition A3 Unitree manufacturer page 1X Technologies manufacturer page Humanoid category China country page

Written by

ui44 Team

Published June 8, 2026

Share this article

Open a plain share link on X or Bluesky. No embeds, no widgets, no cookie baggage.

Share on X Share on Bluesky Permalink

Humanoid Robot Benchmarks: What Buyers Should Ask

Why Benchmarks Suddenly Matter

What NIST Is Trying To Standardize

What Fraunhofer Adds: Safety, Security, And Runtime

The Evidence Ladder Buyers Should Use

How This Changes The Robot Comparison

Where Competitions Fit In

What should home buyers ask before trusting a humanoid robot?

What To Watch Next

Use this article as a privacy verification workflow

Suggested next steps in ui44

Robot profiles worth opening next

Manufacturer context behind the article

Broaden the scan without leaving the database

Category framing

Country and ecosystem context

Questions to answer before you move from reading to buying

Frequently Asked Questions

Where to go next in ui44

Go beyond the headlines