Multimodal - ✦ Smart Content Report

Googles Gemma 4 12B is powerful and runs locally on laptops with just 16 GB of memory

June 4, 2026

Google has released Gemma 4 12B, an open-weights multimodal AI model designed to run entirely on a standard laptop with 16 GB of VRAM or unified memory. The model is available now for free download and marks a notable step toward powerful AI that operates offline, without sending data to the cloud. The model is …

One model to rule them all: Google bets on Gemini Omni to reinvent content creation

May 19, 2026

Google has launched Gemini Omni, a new artificial intelligence model that accepts text, images, audio, and video as input and produces video output. The company describes it as natively multimodal, meaning a single model handles all content types rather than passing tasks between separate systems. The first release in the family, Gemini Omni Flash, is …

Thinking Machines launches research preview of real-time interaction AI model

May 18, 2026

Thinking Machines, the AI startup co-founded by former OpenAI CTO Mira Murati, has announced a research preview of what it calls “interaction models” — AI systems designed to perceive and respond in real time rather than waiting for a user to finish speaking or typing. Current AI models work in turns: the user sends an …

Nvidia releases Nemotron 3 Nano Omni, a unified multimodal AI model

April 29, 2026

Nvidia has launched Nemotron 3 Nano Omni, an open AI model that combines text, vision and audio processing in a single system. Most existing AI agent systems rely on separate models for each modality, which increases latency and cost. Nvidia says its new model eliminates that fragmentation. The model uses a hybrid mixture-of-experts architecture with …

Anthropic releases Claude Opus 4.7 with stronger coding and vision capabilities

April 17, 2026

Anthropic has released Claude Opus 4.7, its most capable publicly available AI model. The company says the model performs better than its predecessor, Claude Opus 4.6, across software engineering, document analysis, and visual tasks. One of the model’s key traits is self-verification. In internal tests, Opus 4.7 built a text-to-speech engine in the Rust programming …

Meta launches proprietary AI model Muse Spark

April 9, 2026

Meta has released Muse Spark, a new proprietary artificial intelligence model built by its internal division Meta Superintelligence Labs. The model is available through the Meta AI app and website, with a private API preview for select users. Unlike Meta’s previous Llama models, Muse Spark is not open source. Muse Spark can process text and …

Google releases Gemma 4, its most capable open AI model family

April 2, 2026

Google has launched Gemma 4, a new family of open-weight AI models that the company describes as its most capable to date. The models are built on the same research and technology as Google DeepMind’s proprietary Gemini 3 system and are released under an Apache 2.0 open-source license, which allows developers to use and modify …

Think less, do more: Microsoft’s new tiny AI knows when to skip the hard thinking

March 10, 2026

Microsoft has released Phi-4-reasoning-vision-15B, a compact AI model that processes both images and text and can solve complex math and science problems. Michael Nuñez reports for VentureBeat that the 15-billion-parameter model matches or exceeds the performance of much larger systems while using significantly less computing power and training data. The model is available now on …

GPT‑5.4 aims to handle real professional work as OpenAI expands agent-style AI

March 5, 2026

OpenAI has released GPT‑5.4, a new AI model designed for professional tasks such as coding, document creation, spreadsheet analysis, and multi‑step workflows. The company positions the model as its most capable system for knowledge work and software development so far. The model is available across ChatGPT, the OpenAI API, and the company’s coding tool Codex. …

Google releases Gemini 3.1 Pro with much improved reasoning

February 19, 2026

Google has released Gemini 3.1 Pro, an updated version of its Gemini 3 Pro AI model. The company describes it as a step forward in core reasoning, intended for complex tasks where straightforward answers fall short. The model is now available to consumers through the Gemini app and NotebookLM, though access on those platforms is …