DeepSeek's new V4 model matches frontier AI at a fraction of the cost

Chinese AI company DeepSeek has released two new open-source language models under the name DeepSeek-V4. The models, called V4-Pro and V4-Flash, are available for download and via API. Both support a context window of one million tokens, meaning they can process roughly eight times the length of a long novel in a single interaction. DeepSeek’s previous flagship model supported 128,000 tokens.

DeepSeek-V4-Pro contains 1.6 trillion parameters in total, making it the company’s largest model to date and, according to technology blogger Simon Willison, the largest open-weights model currently available. Only 49 billion of those parameters are active during any given inference, keeping computational costs lower than the total figure suggests. DeepSeek-V4-Flash is smaller, with 284 billion total parameters and 13 billion active.

How DeepSeek reduced costs

A central theme of the release is efficiency. DeepSeek says that in a one-million-token context, V4-Pro requires only 27 percent of the computational operations and 10 percent of the memory cache that its predecessor, DeepSeek-V3.2, needed for the same task. V4-Flash pushes this further, using just 10 percent of the computing operations and 7 percent of the cache. The company achieved this through a new hybrid attention mechanism it calls Compressed Sparse Attention and Heavily Compressed Attention, which it designed specifically to handle very long inputs more cheaply.

These efficiency gains translate directly into price. DeepSeek charges $0.14 per million input tokens and $0.28 per million output tokens for V4-Flash. V4-Pro is priced at $1.74 per million input tokens and $3.48 per million output tokens. Willison, who compiled a comparison table of current AI model prices, notes that V4-Flash undercuts even OpenAI’s GPT-5.4 Nano, while V4-Pro is the cheapest among larger frontier models.

Both models were pre-trained on more than 32 trillion tokens of text. DeepSeek says the models then underwent a post-training process that first developed specialised capabilities in separate domains, then merged those capabilities into a single model.

Performance claims

DeepSeek reports that its V4-Pro model, when run in its highest reasoning mode, matches or comes close to models from OpenAI, Google DeepMind, and Anthropic on several standard benchmarks, including coding competitions and mathematical reasoning tests. The company’s own technical report states, however, that V4-Pro “falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months.”

Shortly after the release, Chinese technology company Huawei announced support for the V4 models on its Ascend chip hardware. Chip manufacturer Cambricon Technologies also announced compatibility. Analysts at Huatai Securities noted that the release explicitly mentions compatibility with domestic Chinese chips and said this could accelerate their adoption.

The model weights are released under the MIT license, allowing broad commercial and research use. Both models are available on Hugging Face and ModelScope.

Sources: Model Card, South China Morning Post, Simon Willison

DeepSeek’s new V4 model matches frontier AI at a fraction of the cost

How DeepSeek reduced costs

Performance claims

Related posts:

How DeepSeek reduced costs

Performance claims

Stay up to date

Related posts: