Yann LeCun’s billion-dollar bet: Why JEPA and world models will conquer AI robotics

How JEPA will overtake VLA in the AI race

The AI landscape is facing a profound philosophical paradigm shift as Yann LeCun champions Joint Embedding Predictive Architectures (JEPA) over mainstream Vision-Language-Action (VLA) models. By training AI to predict world dynamics using learned embedding spaces rather than relying entirely on unscalable behavioral cloning, this approach promises to yield functional world models for complex planning. Ultimately, this structural migration toward explicit hierarchical planning could radically redefine how we control autonomous agents, complex industries, and robotics over the next five years.

Points clés

Physical Intelligence recently showcased PI07, a highly impressive Vision-Language-Action (VLA) model capable of executing complex physical household tasks through generalized behavioral cloning.
Yann LeCun predicts mainstream VLA approaches are fundamentally “doomed” due to their heavy reliance on massive human demonstrations and a complete lack of explicit forward-planning mechanisms.
V-JEPA 2 was ambitious trained by Meta on 1 million hours of video using up to 1 billion parameters to learn the physical rules of the world without being restricted by language supervision.
In late 2025, a research team at Meta revealed VL-JEPA, achieving a 35% video classification accuracy after 5 million training examples, vastly outperforming the 20% accuracy of traditional VLM counterparts.
VL-JEPA successfully outperformed standard 7-billion parameter language models on the GQA compositional reasoning benchmark while operating on a highly efficient 1.6 billion parameters.
Unlike VLAs that act as black boxes directly translating inputs into action, JEPA builds an embedded “world model” that acts as a simulated video game to predict the physical consequences of actions.
The Layworld model framework demonstrated that hierarchical JEPA architectures can effectively extend a robot’s predictive planning horizon for object manipulation tasks like “PushT” from 5 to 15 execution steps.
Through a cross-entropy method (CEM) planner, the JEPA world model evaluates paths by mathematically measuring the Euclidean distance between candidate trajectory embeddings and the final target goal embedding.
Omni Labs, founded by Yann LeCun, plans to apply these predictive JEPA world models to complex phenomenological systems, including chemical plants, jet engines, and diabetes management, within the next two to five years.

À retenir

If you are planning to build a robotic butler soon, do not just teach it to blindly mimic human chores like a highly sophisticated parrot. Take a page out of Yann LeCun’s playbook and give it a functional “world model” so it can actually predict the catastrophic consequences of knocking over your coffee. After all, it is incredibly reassuring to know that the billion-parameter AI running our future chemical plants might actually pause to think ahead, rather than aggressively hallucinating its way into the next industrial disaster.

Sources