Microsoft expands Phi language model family with new reasoning capabilities

Microsoft has introduced three new small language models (SLMs) focused on complex reasoning tasks: Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning. These models represent a significant advancement in what small AI models can accomplish, particularly in mathematical reasoning and multi-step problem solving.

The flagship Phi-4-reasoning-plus, a 14-billion parameter model, demonstrates performance that rivals much larger AI systems. According to Microsoft’s benchmarks, it outperforms OpenAI’s o1-mini and DeepSeek-R1-Distill-Llama-70B on various reasoning tasks, despite being significantly smaller. On the AIME 2025 test—a qualifier for the USA Math Olympiad—it even achieves better results than the full DeepSeek-R1 model with 671 billion parameters.

What sets these models apart is their training methodology. Microsoft employed supervised fine-tuning using carefully selected reasoning demonstrations, followed by reinforcement learning to enhance reasoning capabilities. The models use special tokens to separate their step-by-step reasoning process from final answers, improving transparency and coherence.

Key features of the new models

  • Phi-4-reasoning-plus: The most capable model at 14 billion parameters, trained with both supervised fine-tuning and reinforcement learning
  • Phi-4-reasoning: Also 14 billion parameters but without the additional RL training
  • Phi-4-mini-reasoning: A compact 3.8-billion parameter model optimized for educational applications

All three models support context lengths of 32,000 tokens by default, with testing showing stable performance up to 64,000 tokens. They’re available on Azure AI Foundry and Hugging Face under a permissive MIT license, allowing for commercial use and customization.

Microsoft emphasizes that these models demonstrate how careful data curation and training techniques can enable smaller models to compete with much larger ones. This has significant implications for enterprise applications, as smaller models require less computational resources and can run efficiently even on resource-limited devices.

The company has also conducted extensive safety testing on these models, including adversarial evaluations by Microsoft’s AI Red Team and benchmarking with tools like Toxigen.

These new Phi models will eventually be integrated into Windows 11 devices, particularly on Copilot+ PCs, where they can leverage the NPU (Neural Processing Unit) for efficient local AI processing. Microsoft plans to use them in features like Click to Do and in productivity applications such as Outlook for offline summary generation.

Sources: Microsoft, VentureBeat, TechCrunch

Related posts:

Stay up-to-date: