NVIDIA Introduces Cosmos Policy: World Foundation Models That Teach Robots to Act

NVIDIA unveiled Cosmos Policy, a new approach to robot control that fine-tunes its Cosmos Predict world foundation model to directly generate robot actions. Instead of bolting separate perception and control modules together, Cosmos Policy encodes robot actions, physical states, and success scores as latent video frames — allowing a single model to handle visuomotor control, future-state prediction, and action planning simultaneously. The system achieved state-of-the-art results on LIBERO and RoboCasa manipulation benchmarks, outperforming both diffusion policies trained from scratch and vision-language-action models. By inheriting the pretrained model's understanding of physics and temporal dynamics, Cosmos Policy represents a significant step toward general-purpose robot intelligence built on video foundation models.

NVIDIACosmosWorld Foundation ModelsAIManipulationRobot Control

NVIDIA Introduces Cosmos Policy: World Foundation Models That Teach Robots to Act

Related News

Vision-Language-Action Models Are Replacing Modular Robotics Pipelines

Humanoid Launches KinetIQ: An AI Framework for Orchestrating Fleets of Humanoid Robots

Figure Unveils Helix 02: Full-Body Neural Network Controls Humanoid Walking, Manipulation, and Balance