Google has built computer use directly into Gemini 3.5 Flash, its latest AI model. Mateo Quiros writes for The Keyword, Google’s official blog, that the capability was previously only available as a separate standalone model. It is now integrated into the main Flash model, making it accessible to a much broader range of developers and businesses.
Computer use means the AI can see what is on a screen, reason about it, and take actions across browsers, mobile apps, and desktop software. This enables the model to perform tasks that previously required human hands on a keyboard and mouse.
What this means in practice
Google highlights two practical examples: the model can analyze an app and return a categorized list of its features, and it can audit documentation for accessibility issues. More broadly, the company points to uses such as continuous software testing and automating knowledge work across professional applications.
Developers can access the capability through the Gemini API and the Gemini Enterprise Agent Platform.
Safety measures for live environments
Google acknowledges that AI agents operating on live systems carry risks. A key concern is prompt injection, where malicious content in the environment tries to hijack the agent’s actions. To counter this, Google has applied targeted adversarial training to the model. Two optional safeguards are also available to enterprise customers:
- Requiring explicit user confirmation before sensitive or irreversible actions are taken
- Automatically stopping a task if an indirect prompt injection is detected
Google recommends combining these tools with secure sandboxing, human oversight, and strict access controls. The company describes this layered approach as “defense-in-depth.”
Stay up to date
AI for content creation: the latest tools, tips and trends. Every two weeks in your inbox: