OpenAI launches Operator, an AI agent that automates web-based tasks

OpenAI has introduced Operator, an AI-powered agent capable of performing web-based tasks autonomously through its own browser interface. The tool, currently available as a research preview to ChatGPT Pro subscribers in the United States, represents the company’s first venture into AI agents that can interact directly with computer interfaces.

The system is powered by a new Computer-Using Agent (CUA) model that combines GPT-4o’s vision capabilities with reinforcement learning to navigate websites and perform tasks like booking restaurant reservations, ordering groceries, and purchasing tickets. Operator works by taking screenshots of web pages and interpreting the graphical user interface elements such as buttons, text fields, and menus, allowing it to interact with websites without requiring specialized API integrations.

OpenAI has implemented several safety measures in Operator’s design. The system requires user intervention for sensitive operations such as entering payment information or login credentials, and it will not collect or screenshot such data. Additionally, Operator asks for user confirmation before finalizing significant actions like submitting orders or sending emails, and includes safeguards against potential misuse through prompt injections or malicious websites.

The company has established partnerships with several businesses including DoorDash, Instacart, OpenTable, Priceline, StubHub, and Uber to ensure the system operates within established terms of service. These collaborations aim to facilitate proper integration while respecting business norms and user privacy requirements.

Current limitations of the system include difficulties with complex interfaces such as slideshow creation and calendar management. The tool also faces restrictions on certain websites that block AI agents, and OpenAI has implemented rate limits on daily usage and task execution. Users can access Operator through operator.chatgpt.com, where tasks are performed in a remote browser running on OpenAI’s servers.

OpenAI states that the system has demonstrated superior performance on industry benchmarks compared to similar tools from competitors, achieving an 87% success rate on WebVoyager for live website navigation. The company plans to expand access to Plus, Team, and Enterprise users in the future, and intends to integrate Operator’s capabilities directly into ChatGPT once safety and usability at scale are confirmed.

Sources: OpenAI, VentureBeat, Every, TechCrunch, Technology Review, The Verge

Related posts:

Stay up-to-date: