Daily News
Meta unveils V-JEPA 2
Published
8 months agoon

Meta has announced V-JEPA 2, its most advanced AI model to date, designed to perceive, predict, and interact with the physical world in a human-like manner. The model, called Video Joint Embedding Predictive Architecture 2 (V-JEPA 2), represents a significant step toward Meta’s vision of Advanced Machine Intelligence (AMI) — AI that can learn, reason, and plan by observing the world around it.
Trained on over a million hours of video, V-JEPA 2 enables AI systems to understand how objects and environments respond to movement and action. Just as a person instinctively knows a thrown tennis ball will fall, V-JEPA 2 can predict such outcomes by watching similar events unfold on video. This training gives the model a kind of ‘common-sense’ understanding of the world, allowing it to develop mental models for real-world interaction.
Unlike traditional AI systems that recognise images or follow commands, V-JEPA 2 predicts future outcomes based on current observations. With 1.2 billion parameters, it offers significant improvements in planning and decision-making over its predecessor.
Meta has also integrated V-JEPA 2 into robotic systems within its labs. These robots completed simple tasks like picking up and placing unfamiliar objects, even in new environments. Using the model, the robots assessed the scene, referenced a goal image, and selected the best action step by step.
To further accelerate AI research, Meta is releasing three new video-based benchmarks to help standardise the evaluation of AI world models. These tools aim to advance physical reasoning and long-term planning in AI.
Meta envisions future models that incorporate additional senses like touch and sound, enabling AI to handle longer and more complex tasks.