Microsoft has released Phi-4-reasoning-vision-15B, a compact AI model that processes both images and text and can solve complex math and science problems. Michael Nuñez reports for VentureBeat that the 15-billion-parameter model matches or exceeds the performance of much larger systems while using significantly less computing power and training data.
The model is available now on Microsoft Foundry, HuggingFace, and GitHub under a permissive open license.
One of its most notable features is how it handles reasoning. Many AI tasks, such as solving equations or interpreting charts, benefit from step-by-step thinking. Others, like reading text from an image or writing a photo caption, do not. Microsoft trained the model on a mix of data where 20 percent included explicit reasoning steps and 80 percent required direct answers. This allows the model to apply deep reasoning only where it helps, saving time and computing resources everywhere else.
The model also required far less training data than competitors. Microsoft used around 200 billion data tokens in total. Rival models from companies including Alibaba, Google, and SenseTime each used more than one trillion tokens.
On standard benchmarks, Phi-4-reasoning-vision-15B scores competitively against models of similar size but trails larger systems on the hardest tasks.
Microsoft achieved this partly through rigorous data quality control. Team members manually reviewed training samples and corrected errors found in widely used public datasets.