Should you walk or drive 50 meters to a car wash? Most AI models get it wrong

A deceptively simple question has exposed a widespread reasoning failure across the artificial intelligence industry. Felix Wunderlich writes for opper.ai that 42 out of 53 leading AI models answered incorrectly when asked: “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?” The correct answer is, of …

Read more

Google releases Gemini 3.1 Pro with much improved reasoning

Google has released Gemini 3.1 Pro, an updated version of its Gemini 3 Pro AI model. The company describes it as a step forward in core reasoning, intended for complex tasks where straightforward answers fall short. The model is now available to consumers through the Gemini app and NotebookLM, though access on those platforms is …

Read more

Claude Sonnet 4.6: near-flagship performance at mid-tier pricing

Anthropic has released Claude Sonnet 4.6, a significant upgrade to its mid-tier AI model. The company says it outperforms its predecessor across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 is now the default model in claude.ai and Claude Cowork and carries the same price as its predecessor, Sonnet 4.5, …

Read more

OpenAI releases GPT-5.3-Codex with 25% speed boost and expanded capabilities

OpenAI has launched GPT-5.3-Codex, a coding model that the company describes as its most capable agentic coding tool to date. The model runs 25% faster than its predecessor and combines advanced coding performance with reasoning capabilities in a single system. According to OpenAI, GPT-5.3-Codex marks a significant milestone as the first model that helped create …

Read more

Anthropic releases Claude Opus 4.6 with expanded context window and agent teams

Anthropic has released Claude Opus 4.6, an upgraded version of its flagship AI model that can handle longer conversations and coordinate multiple AI agents working simultaneously on complex tasks. The company claims the model outperforms competitors including OpenAI’s GPT-5.2 on several professional benchmarks. The release introduces a one-million-token context window for the first time in …

Read more

Google’s Gemini 3 Flash wants to end the compromise between performance and cost

Google has released Gemini 3 Flash, positioning the model as a solution to what the company describes as a longstanding compromise in artificial intelligence between speed and capability. The model combines what Google calls “PhD-level reasoning” with faster processing speeds and lower costs compared to larger models. Gemini 3 Flash is now the default model …

Read more

ChatGPT’s smartest brain yet: GPT-5.2 arrives with fewer hallucinations and stronger coding skills

OpenAI has released GPT-5.2, its latest artificial intelligence model series, on December 11, 2025. The company describes it as the most capable model for professional knowledge work to date. The release includes three variants: GPT-5.2 Instant for faster responses, GPT-5.2 Thinking for complex tasks, and GPT-5.2 Pro for the most demanding questions. According to OpenAI, …

Read more

Google’s Gemini gets new eyes and a bigger brain

Google’s latest and most capable AI model, Gemini 3 Pro, has advanced capabilities when it comes to tasks that require visual understanding. In a post on the Google Blog, the company outlined how the model processes and reasons about visual information from various sources. According to Google, the model demonstrates strong performance in several key …

Read more

Mistral 3 brings frontier AI to your pocket with unprecedented openness

Mistral AI has launched Mistral 3, a collection of 10 open-source AI models designed to run on devices ranging from smartphones to enterprise cloud systems. The French startup released all models under the Apache 2.0 license, allowing unrestricted commercial use. The release includes Mistral Large 3, the company’s flagship model, and the Ministral 3 series …

Read more

DeepSeeks new open-source AI models claim performance comparable to GPT-5

Chinese AI startup DeepSeek released two new language models on December 1, 2025, claiming they match the performance of OpenAI’s GPT-5 and Google’s Gemini-3.0-Pro on multiple benchmarks. The company made both models freely available under an open-source MIT license. DeepSeek-V3.2 serves as an everyday reasoning assistant, while DeepSeek-V3.2-Speciale focuses on advanced mathematical and coding tasks. …

Read more