Name: One model to rule them all: Google bets on Gemini Omni to reinvent content
Uploaded: 2026-05-19T15:06:25-06:00
Channel: SCR
Description: Google has launched Gemini Omni, a new artificial intelligence model that accepts text, images, audio, and video as input and produces video output. The company describes it as natively multimodal, meaning a single model handles all content types rather than passing tasks between separate systems. The first release in

Google has launched Gemini Omni, a new artificial intelligence model that accepts text, images, audio, and video as input and produces video output. The company describes it as natively multimodal, meaning a single model handles all content types rather than passing tasks between separate systems. The first release in the family, Gemini Omni Flash, is now available to subscribers of Google’s AI Plus, Pro, and Ultra plans through the Gemini app and Google Flow. It is also available at no cost on YouTube Shorts.

One of Omni’s central features is conversational video editing. Users can issue instructions in plain language, and each instruction builds on the previous one. Characters, objects, and scene details remain consistent across edits. Google says the model can change specific elements in a video, reimagine actions, add characters, and transform environments without losing the continuity of the original clip.

Google also claims the model has an improved understanding of physical principles such as gravity, fluid dynamics, and kinetic energy, which affects how objects and environments behave in generated footage. The company says Omni can also draw on its broader knowledge base to produce educational explainer videos from short prompts.

Beyond editing, users can feed the model a combination of reference materials — a photograph, a short video clip, an audio file — and Omni will blend them into a single output. Google says voice references are supported for audio from launch, with other audio input types to follow.

A personal avatar feature lets users record a short video to authorize the system to generate video content in their likeness and voice.

Every video created with Omni carries a SynthID digital watermark, a technology developed by Google to mark AI-generated content. Google says users can verify whether a video was made with Omni through the Gemini app, Chrome, and Google Search. The company is also expanding support for C2PA Content Credentials, an industry standard for labeling how media was created or edited.

VentureBeat notes that Gemini Omni competes directly with tools from companies including Synthesia, ByteDance, and Kuaishou. The publication also notes that one early tester reported the model’s content restrictions to be strict, which could limit some use cases.

For businesses, VentureBeat points out that the model is not yet available through a programming interface, which most enterprises require for integration into their own systems. Google has said API access through its Vertex AI platform is coming in the following weeks.

Google did not publish performance benchmarks alongside the launch.

Sources: Google Blog, VentureBeat

One model to rule them all: Google bets on Gemini Omni to reinvent content creation

Related posts:

Stay up to date

Related posts: