NVIDIA Introduces Cosmos Policy: World Foundation Models That Teach Robots to Act
NVIDIA unveiled Cosmos Policy, a new approach to robot control that fine-tunes its Cosmos Predict world foundation model to directly generate robot actions. Instead of bolting separate perception and control modules together, Cosmos Policy encodes robot actions, physical states, and success scores as latent video frames — allowing a single model to handle visuomotor control, future-state prediction, and action planning simultaneously. The system achieved state-of-the-art results on LIBERO and RoboCasa manipulation benchmarks, outperforming both diffusion policies trained from scratch and vision-language-action models. By inheriting the pretrained model's understanding of physics and temporal dynamics, Cosmos Policy represents a significant step toward general-purpose robot intelligence built on video foundation models.

