Open model DeepSeek-V3 performs similar to closed competition

Chinese AI startup DeepSeek has launched DeepSeek-V3, a powerful new AI model that outperforms existing open-source alternatives. According to reporting by Shubham Sharma at VentureBeat, the model features 671 billion parameters but activates only 37 billion for each task through its mixture-of-experts architecture. The model was trained on 14.8 trillion diverse tokens and demonstrates superior performance across multiple benchmarks, particularly in mathematics and code-related tasks.

The model introduces two key innovations: an auxiliary loss-free load-balancing strategy that optimizes expert utilization, and a multi-token prediction capability that enables three times faster token generation. DeepSeek achieved these results with remarkable cost efficiency, completing the training process for approximately $5.57 million – significantly less than competitors like Meta’s Llama 3.1, which reportedly cost over $500 million to train.

Benchmark testing shows DeepSeek-V3 surpassing other open-source models including Meta’s Llama 3.1-405B and Qwen 2.5-72B. It performs comparably to closed-source models like GPT-4 and Claude 3.5 Sonnet, though each maintains advantages in specific areas.

The model is now available through Hugging Face under the company’s license agreement, and enterprises can access it through DeepSeek Chat or via API. Commercial API pricing is set at $0.27 per million input tokens and $1.10 per million output tokens after February 8.

Related posts:

Stay up-to-date: