Google DeepMind has launched a new specialized AI model that allows software agents to interact with graphical user interfaces. In an official post, Google DeepMind reports that the Gemini 2.5 Computer Use model enables AI to perform tasks on websites and mobile apps by clicking, typing, and scrolling just like a human.
The model works by analyzing a user’s request along with a screenshot of the current application window. It then determines the next action, such as filling out a form or selecting an item from a dropdown menu. After executing an action, it receives a new screenshot and repeats the process until the task is complete. According to the company, this allows agents to navigate complex web pages, use interactive elements, and operate behind logins.
Google states that the model outperforms leading alternatives on several web and mobile control benchmarks while operating with lower latency. Early testers have used the system for workflow automation, personal assistants, and software testing. To address safety concerns, the model includes built-in guardrails and a separate service that assesses each proposed action before execution. Developers can also require user confirmation for potentially high-risk actions, such as making a purchase. The model is now available in a public preview for developers.