One model to rule them all: Google bets on Gemini Omni to reinvent content creation

Google has launched Gemini Omni, a new artificial intelligence model that accepts text, images, audio, and video as input and produces video output. The company describes it as natively multimodal, meaning a single model handles all content types rather than passing tasks between separate systems. The first release in the family, Gemini Omni Flash, is …

Read more

Thinking Machines launches research preview of real-time interaction AI model

Thinking Machines, the AI startup co-founded by former OpenAI CTO Mira Murati, has announced a research preview of what it calls “interaction models” — AI systems designed to perceive and respond in real time rather than waiting for a user to finish speaking or typing. Current AI models work in turns: the user sends an …

Read more

Nvidia releases Nemotron 3 Nano Omni, a unified multimodal AI model

Nvidia has launched Nemotron 3 Nano Omni, an open AI model that combines text, vision and audio processing in a single system. Most existing AI agent systems rely on separate models for each modality, which increases latency and cost. Nvidia says its new model eliminates that fragmentation. The model uses a hybrid mixture-of-experts architecture with …

Read more

Anthropic releases Claude Opus 4.7 with stronger coding and vision capabilities

Anthropic has released Claude Opus 4.7, its most capable publicly available AI model. The company says the model performs better than its predecessor, Claude Opus 4.6, across software engineering, document analysis, and visual tasks. One of the model’s key traits is self-verification. In internal tests, Opus 4.7 built a text-to-speech engine in the Rust programming …

Read more

Meta launches proprietary AI model Muse Spark

Meta has released Muse Spark, a new proprietary artificial intelligence model built by its internal division Meta Superintelligence Labs. The model is available through the Meta AI app and website, with a private API preview for select users. Unlike Meta’s previous Llama models, Muse Spark is not open source. Muse Spark can process text and …

Read more

Google releases Gemma 4, its most capable open AI model family

Google has launched Gemma 4, a new family of open-weight AI models that the company describes as its most capable to date. The models are built on the same research and technology as Google DeepMind’s proprietary Gemini 3 system and are released under an Apache 2.0 open-source license, which allows developers to use and modify …

Read more

Think less, do more: Microsoft’s new tiny AI knows when to skip the hard thinking

Microsoft has released Phi-4-reasoning-vision-15B, a compact AI model that processes both images and text and can solve complex math and science problems. Michael Nuñez reports for VentureBeat that the 15-billion-parameter model matches or exceeds the performance of much larger systems while using significantly less computing power and training data. The model is available now on …

Read more

GPT‑5.4 aims to handle real professional work as OpenAI expands agent-style AI

OpenAI has released GPT‑5.4, a new AI model designed for professional tasks such as coding, document creation, spreadsheet analysis, and multi‑step workflows. The company positions the model as its most capable system for knowledge work and software development so far. The model is available across ChatGPT, the OpenAI API, and the company’s coding tool Codex. …

Read more

Google releases Gemini 3.1 Pro with much improved reasoning

Google has released Gemini 3.1 Pro, an updated version of its Gemini 3 Pro AI model. The company describes it as a step forward in core reasoning, intended for complex tasks where straightforward answers fall short. The model is now available to consumers through the Gemini app and NotebookLM, though access on those platforms is …

Read more

Alibaba releases Qwen3.5, a multimodal AI model with 397 billion parameters

Alibaba has launched Qwen3.5, a new artificial intelligence model designed to function as a multimodal agent capable of processing text, images, and video. The QwenTeam announces this development on the company’s website. The model contains 397 billion parameters but activates only 17 billion per task, which the team says optimizes both speed and cost. This …

Read more

×