AI researcher identifies six major paradigm shifts in large language models during 2025

The development of large language models has undergone fundamental changes in 2025, marked by new training methods and surprising capabilities that reveal a fundamentally different form of intelligence than expected. AI researcher Andrej Karpathy writes on his blog about six major shifts that defined the year.

The most significant change involves a new training technique called Reinforcement Learning from Verifiable Rewards (RLVR), which has become a standard stage in developing production models. Unlike previous training methods that required human feedback, RLVR trains models against automatically verifiable rewards in environments like math and coding puzzles. This approach allows models to spontaneously develop reasoning strategies, breaking down problems into intermediate steps. OpenAI’s o3 model, released in early 2025, marked the obvious inflection point for this technology.

The new training method consumed computing resources originally intended for other purposes, resulting in similar sized models but with significantly longer training runs. RLVR also introduced a new capability: models can now generate longer reasoning traces, effectively increasing their “thinking time” to solve harder problems.

Karpathy describes a crucial conceptual shift in understanding what these models actually are. “We’re not evolving animals, we are summoning ghosts,” he writes. LLMs display what he calls “jagged intelligence,” spiking in capability in verifiable domains while remaining surprisingly weak in others. A model can simultaneously act as a genius polymath and a confused grade schooler, seconds away from falling for a simple trick.

This jagged nature has created what Karpathy calls a “loss of trust in benchmarks.” Because benchmarks are verifiable environments, labs can easily optimize models specifically for them, creating spikes in capability that don’t reflect general intelligence.

The year also saw new application layers emerge. Cursor, a coding assistant, revealed what Karpathy describes as a distinct category of “LLM app” that bundles and orchestrates multiple model calls for specific tasks. These applications handle context engineering, balance performance and cost tradeoffs, and provide specialized interfaces for human oversight.

Claude Code introduced a new paradigm by running directly on users’ computers rather than in the cloud. This approach gives the AI access to local context, data, and configuration, transforming it from a website into what Karpathy calls “a little spirit that lives on your computer.”

The capability improvements enabled what Karpathy terms “vibe coding,” where people can build software using plain English without seeing the underlying code. This approach empowers both non-programmers and professionals to create applications that would otherwise never be written.

Google’s Gemini Nano banana model hints at another paradigm shift. Karpathy compares current text based chat interfaces to issuing commands to a 1980s computer console. Just as graphical user interfaces transformed traditional computing, LLMs need to communicate visually through images, infographics, and animations rather than text alone.

Karpathy concludes that LLMs represent a new form of intelligence, simultaneously smarter and dumber than expected, and that the industry has realized less than ten percent of their potential at current capability levels.

About the author

Related posts:

Stay up-to-date:

Advertisement