OpenAI has released a new family of AI models called GPT-4.1, focusing on improved coding capabilities, better instruction following, and expanded context windows while simultaneously reducing prices. The new lineup includes three models: GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano, all available immediately through OpenAI’s API but not yet in ChatGPT.
Enhanced capabilities with focus on coding
According to OpenAI’s announcement, the new models excel particularly at coding tasks. On the SWE-bench Verified benchmark, which measures software engineering skills, GPT-4.1 achieved a score of 54.6%, representing a significant improvement over GPT-4o’s 33.2%. The company claims the models are now better at frontend coding, making fewer unnecessary edits, and following formatting instructions more reliably.
Beyond coding, the models demonstrate improved instruction following capabilities. On Scale’s MultiChallenge benchmark, GPT-4.1 scored 38.3%, outperforming GPT-4o by 10.5 percentage points. This improvement aims to make the models more reliable for complex, multi-step tasks.
Million-token context window
All three new models can process up to one million tokens of context, equivalent to approximately 750,000 words or eight copies of the entire React codebase. This represents a substantial increase from the previous 128,000 token limit in GPT-4o models.
OpenAI claims this expanded context allows the models to better handle large codebases, lengthy documents, and even analyze extended videos. In benchmark tests, GPT-4.1 achieved a state-of-the-art result of 72.0% on Video-MME for long videos without subtitles.
Aggressive pricing strategy
Perhaps most significant for developers and businesses is OpenAI’s new pricing structure:
- GPT-4.1: $2.00 per million input tokens, $8.00 per million output tokens
- GPT-4.1 mini: $0.40 per million input tokens, $1.60 per million output tokens
- GPT-4.1 nano: $0.10 per million input tokens, $0.40 per million output tokens
This represents a 26% reduction compared to GPT-4o for typical usage patterns. Additionally, OpenAI has increased its prompt caching discount to 75% (up from 50% previously) and offers long context requests at no additional cost beyond standard per-token rates.
This aggressive pricing move puts pressure on competitors like Anthropic, Google, and xAI. For comparison, Anthropic’s Claude 3.7 Sonnet costs $3.00 per million input tokens and $15.00 per million output tokens, making GPT-4.1 significantly more affordable.
Real-world applications
Several enterprise customers who tested the models prior to launch reported substantial improvements. Thomson Reuters saw a 17% improvement in multi-document review accuracy when using GPT-4.1 with its legal AI assistant. Financial firm Carlyle reported 50% better performance on extracting financial data from dense documents.
Windsurf, a coding tool provider, found that GPT-4.1 reduced unnecessary file reading by 40% compared to other leading models and modified unnecessary files 70% less often.
Availability and future plans
The new models are available immediately through OpenAI’s API. The company plans to gradually incorporate features from GPT-4.1 into ChatGPT over time.
OpenAI positions this release as focusing on practical utility rather than simply chasing benchmark scores, suggesting a strategic shift toward more accessible and efficient AI models for business applications. The emphasis on reduced costs and improved performance in real-world scenarios indicates OpenAI’s commitment to making its technology more widely accessible to developers and enterprises alike.
Sources: OpenAI, TechCrunch, VentureBeat, VentureBeat