Google has opened its Gemini Omni Flash model to developers and enterprise customers via API, bringing conversational video editing to professional workflows for the first time. Sam Witteveen reports for VentureBeat that the model, which debuted at Google I/O 2026, lets users generate and refine video clips through plain-language instructions rather than prompt-by-prompt generation from scratch.
The central capability is stateful, multi-turn editing. Each instruction builds on the previous one, so a user can adjust lighting, reframe a shot, or change an on-screen logo without losing the parts of a clip that already worked. Google calls the underlying technology the Interactions API.
One model replaces a five-tool pipeline
Many teams currently stitch together separate tools for scripting, image generation, video generation, lip-sync, and voice. Omni Flash handles all of these in a single model that accepts text, images, and short video clips as input and returns a finished clip with audio. This reduces the number of vendors, contracts, and data paths a team needs to manage.
Additional features relevant to content teams include:
- Reference image support: feed the model a product photo or brand logo and it incorporates the actual object rather than a generic substitute
- Physics-aware rendering: adding rain to a scene also generates reflections of people and objects in wet pavement
- Text and logo insertion: on-screen signs can be rewritten or translated, though Google notes results are not always consistent across frames
Pricing, limits, and guardrails
Omni Flash costs $0.10 per second of generated video, putting a ten-second clip at roughly one dollar. Output is capped at 720p and clips run between three and ten seconds. There is no 1080p or 4K option at this time.
Every clip carries Google’s SynthID watermark and C2PA content credentials. The model will not lip-sync a still photo of a real person to an audio clip, an explicit restriction aimed at limiting deepfakes. It will, however, translate existing recorded speech into another language, which Google positions as a localization tool for training content.
In LMArena’s Text-to-Video Arena leaderboard, Omni Flash currently ranks first with a score of 1527. Google acknowledges in its own model card that character consistency across edits and accurate text rendering remain open problems.
Sources
- Google’s Gemini Omni Flash hits the API, turning enterprise video production into a conversation – VentureBeat
- Start building with Nano Banana 2 Lite and Gemini Omni Flash – The Keyword (Google Blog)
Stay up to date
AI for content creation: the latest tools, tips and trends. Every two weeks in your inbox: