Pleias launches small reasoning models optimized for RAG with built-in citations

French AI startup Pleias has released two open-source small reasoning models specifically designed for retrieval-augmented generation (RAG) with native citation support. As reported by Carl Franzen for VentureBeat, the new models—Pleias-RAG-350M and Pleias-RAG-1B—are available under the Apache 2.0 license, allowing commercial use. Despite their small size, the models outperform many larger alternatives on multi-hop reasoning …

Read more

OpenAI’s reasoning models show increased hallucination rates

OpenAI’s new reasoning AI models, o3 and o4-mini, hallucinate more frequently than their predecessors, according to internal testing. Maxwell Zeff from TechCrunch reports that o3 hallucinated in 33% of questions on OpenAI’s PersonQA benchmark, approximately double the rate of previous models. The o4-mini performed even worse, with a 48% hallucination rate. OpenAI acknowledged in its …

Read more

Google introduces Gemini 2.5 Flash with adjustable “thinking” capabilities

Google has released Gemini 2.5 Flash in preview, offering developers unprecedented control over the AI model’s reasoning capabilities. This new version allows users to toggle “thinking” on or off and set specific “thinking budgets” to balance quality, cost, and response time. The pricing structure reveals the cost impact of reasoning: input costs $0.15 per million …

Read more

OpenAI launches o3 and o4-mini with enhanced reasoning and visual capabilities

OpenAI has released two new AI models, o3 and o4-mini, designed to advance reasoning capabilities and introduce novel features like “thinking with images.” These models represent the company’s latest development in its o-series, coming just days after the release of GPT-4.1. The models’ most distinctive feature is their ability to not just recognize images but …

Read more

Google introduces efficient Gemini 2.5 Flash AI model for developers

Google has launched Gemini 2.5 Flash, a new AI model designed for efficiency and strong performance. According to Kyle Wiggers of TechCrunch, the model will soon be available on Google’s Vertex AI development platform. The new model offers “dynamic and controllable” computing that allows developers to adjust processing time based on query complexity. As a …

Read more

Deep Cogito releases new open source AI models with hybrid reasoning capabilities

Deep Cogito, a San Francisco-based AI startup, has emerged from stealth with the release of Cogito v1, a new line of open source large language models featuring hybrid reasoning capabilities. Carl Franzen from VentureBeat reports that the models, fine-tuned from Meta’s Llama 3.2, can either answer immediately or engage in “self-reflection” similar to OpenAI’s “o” …

Read more

Nvidia releases powerful Llama-3.1 Nemotron Ultra language model

Nvidia has launched Llama-3.1-Nemotron-Ultra-253B, a fully open-source language model that outperforms the larger DeepSeek R1 on several benchmarks despite having less than half the parameters. Carl Franzen of VentureBeat reports the model is now available on Hugging Face with open weights and training data. The 253-billion parameter model features a unique toggle for “reasoning on” …

Read more

OpenAI announces plans for first open-source language model in years

OpenAI intends to release its first “open” language model since GPT-2 in the coming months, according to a feedback form published on the company’s website. Kyle Wiggers reports that OpenAI is inviting developers, researchers, and community members to provide input on what they’d like to see in this new model. The company plans to host …

Read more

New AGI benchmark shows major gap between human and AI reasoning abilities

The Arc Prize Foundation has released ARC-AGI-2, a new benchmark designed to measure artificial general intelligence that has proven extremely difficult for even the most advanced AI systems. This second-generation test specifically evaluates test-time reasoning abilities – requiring AI to adapt to novel, never-before-seen tasks rather than relying on memorization. The results reveal a stark …

Read more

×