DeepSeek releases new reasoning models and introduces distilled versions

Chinese AI company DeepSeek has announced the release of its new reasoning-focused language models DeepSeek-R1-Zero and DeepSeek-R1, along with six smaller distilled versions. The main models, built on DeepSeek’s V3 architecture, feature 671 billion total parameters with 37 billion activated parameters and a context length of 128,000 tokens. According to company statements, DeepSeek-R1 achieves performance comparable to OpenAI’s models across mathematics, coding, and reasoning tasks.

The company has also introduced distilled versions of the model ranging from 1.5 billion to 70 billion parameters, based on both Llama and Qwen architectures. These smaller models are designed to be more accessible for researchers and developers with limited computing resources. Evaluation results shared by DeepSeek show that their 32B distilled model outperforms several existing models on specific benchmarks, particularly in mathematical reasoning and coding tasks.

All models have been released under open-source licenses, though questions remain about licensing compatibility for the Llama-based versions. The new models feature chain-of-thought capabilities and were developed using a combination of reinforcement learning and supervised fine-tuning. DeepSeek notes that while the R1-Zero version shows strong reasoning capabilities, it faces challenges with repetition and language mixing, which they addressed in the refined R1 version.

Source: Simon Willison’s Blog

Related posts:

Stay up-to-date: