OpenAI releases GPT-5.5 with improved agentic capabilities

OpenAI has announced GPT-5.5, its latest large language model, positioned as a significant step forward in handling complex, multi-step tasks on a computer. The model is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex.

According to OpenAI, GPT-5.5 is designed to handle “agentic” work: tasks where an AI model operates with a degree of autonomy, planning steps, using tools, checking its own output, and continuing until a task is complete. OpenAI says users can hand the model a vague, complicated task and expect it to work through the ambiguity rather than requiring constant guidance.

The model builds on GPT-5.4, which OpenAI released roughly seven weeks earlier. OpenAI says GPT-5.5 matches GPT-5.4’s response speed while performing at a higher level, and completes the same coding tasks using fewer tokens — the units of text that AI models process and generate.

OpenAI highlights several areas where it says the model shows notable improvement:

Agentic coding: Writing, debugging, and refactoring code across large software projects
Knowledge work: Generating documents, spreadsheets, and presentations; conducting research
Computer use: Navigating software interfaces, clicking, typing, and moving between applications
Scientific research: Analyzing datasets and supporting multi-step research workflows

In benchmark tests, GPT-5.5 scored 82.7 percent on Terminal-Bench 2.0, which measures complex command-line workflows, and 78.7 percent on OSWorld-Verified, which tests whether a model can independently operate computer environments. On GDPval, a benchmark that evaluates performance across 44 occupations, it scored 84.9 percent.

OpenAI also reports that an internal version of GPT-5.5 contributed to a new mathematical proof related to Ramsey numbers, a topic in combinatorics. The proof was later verified using the formal proof assistant Lean. OpenAI describes this as an example of the model contributing a novel mathematical argument, not just explaining existing ones.

OpenAI president Greg Brockman confirmed in a press briefing, reported by The Deep View, that GPT-5.5 is the model intended to power the company’s planned “superapp”: a broader consumer product that would extend Codex beyond developers to general users automating everyday tasks. Brockman said OpenAI would continue rolling out aspects of that product incrementally.

On safety, OpenAI says GPT-5.5 went through its full preparedness and safety evaluation process, including targeted testing for advanced biology and cybersecurity capabilities. The company classifies the model’s biological and cybersecurity capabilities as “High” under its Preparedness Framework, one level below “Critical.” OpenAI says it is deploying stricter content classifiers for potentially risky cybersecurity requests and expanding a program called Trusted Access for Cyber, which gives verified security professionals broader access to the model’s capabilities with fewer restrictions.

OpenAI also says it used GPT-5.5 and its Codex tool in developing the model’s own infrastructure. The company says Codex analyzed production traffic patterns and wrote custom algorithms that improved token generation speeds by more than 20 percent.

The model is available to API developers at $5 per million input tokens and $30 per million output tokens. A more capable version, GPT-5.5 Pro, is available to Pro, Business, and Enterprise ChatGPT users and will be priced at $30 per million input tokens and $180 per million output tokens in the API. OpenAI notes that while GPT-5.5 costs more than GPT-5.4, it is designed to complete tasks with fewer tokens, which it says offsets the higher price for most users.

Sources: OpenAI, The Verge, The Deep View

Stay up to date

Related posts: