OpenAI has just launched GPT-5.5, a major new model release that the company describes as its most capable system yet for real-world work. The model is designed to move beyond traditional chatbot interactions and into sustained execution of complex, multi-step tasks across software development, research, business operations, and data analysis.
The release reflects a broader shift inside OpenAI toward building systems that act less like assistants and more like collaborators. “GPT-5.5 is built for real work,” the company said, emphasizing its ability to plan, execute, and refine tasks across long time horizons while maintaining coherence and accuracy.
At its core, GPT-5.5 is optimized for coding, computer use, knowledge work, and scientific reasoning, areas where the company says previous models still required significant human supervision. The goal, according to OpenAI, is to close the gap between what frontier models can theoretically do and what they can reliably deliver in practice.
A Leap in Coding, Reasoning, and Execution
GPT-5.5 shows measurable gains across major industry benchmarks. On Terminal-Bench 2.0, which evaluates command-line workflows requiring tool use and planning, the model achieves 82.7 percent, up from 75.1 percent in GPT-5.4. On SWE-Bench Pro, a widely used benchmark for real-world software engineering, it reaches 58.6 percent, again improving on its predecessor.
These improvements translate into tangible gains for developers. OpenAI says GPT-5.5 is better at understanding the structure of large codebases, identifying root causes of failures, and implementing fixes that work across multiple files and systems. Early testers described the model as more reliable in “end-to-end engineering tasks,” where success depends on coordinating multiple steps rather than producing isolated snippets.
One tester noted that GPT-5.5 “feels like it understands the system, not just the code,” highlighting a shift toward deeper reasoning and contextual awareness.
The model also advances OpenAI’s broader push toward agentic workflows, where AI systems can independently complete tasks across tools. On OSWorld-Verified, a benchmark that measures real-world computer use, GPT-5.5 scores 78.7 percent, demonstrating its ability to operate software environments with minimal human intervention.
From Productivity Tool to Economic Engine
The company says the biggest impact of GPT-5.5 may be in knowledge work, where it can generate presentations, build spreadsheets, analyze data, and produce structured outputs at scale. On GDPval, OpenAI’s benchmark covering 44 occupations, GPT-5.5 reaches 84.9 percent, outperforming GPT-5.4 and approaching expert-level performance across a wide range of tasks.
“GPT-5.5 is better at producing real work products, not just answers,” OpenAI said. “It can generate deliverables that are closer to what a professional would produce.”
The model is also more efficient. OpenAI says GPT-5.5 achieves higher quality results with fewer tokens, reducing the number of iterations needed to complete a task. This efficiency lowers the cost of reaching a given level of output quality, even as the model itself becomes more advanced.
Inside OpenAI, the shift is already visible. The company reports that more than 85 percent of employees now use Codex weekly, applying AI to tasks across engineering, finance, marketing, and communications. In one example, teams used GPT-5.5 to analyze speaking request data and generate structured reports, saving several hours per week per employee.
“This is where AI becomes infrastructure,” OpenAI said, describing the model as a system that supports entire workflows rather than isolated tasks.
Advancing Scientific Discovery
Beyond enterprise use, GPT-5.5 is also pushing into scientific research. On GeneBench, a benchmark focused on genetics and quantitative biology, the model shows significant improvement over previous versions. OpenAI says it is better at exploring hypotheses, interpreting ambiguous results, and iterating across complex research workflows.
In one internal experiment, a customized version of GPT-5.5 contributed to discovering a new proof related to Ramsey numbers, a core concept in combinatorics. The result was later verified by researchers, illustrating how AI can assist in advancing mathematical knowledge under human supervision.
“We’re beginning to see AI meaningfully accelerate science,” the company said, while noting that human oversight remains essential.
Safety, Security, and Deployment
OpenAI also highlighted improvements in factual reliability and safety. GPT-5.5 reduces error rates compared to GPT-5.4 and includes stricter safeguards for high-risk domains, particularly cybersecurity. The company says it is expanding controlled access to cyber-related capabilities through its Trusted Access for Cyber program while maintaining tighter usage controls.
“Security and alignment are core to how we deploy these systems,” OpenAI said, adding that stronger guardrails are necessary as models gain more autonomy.
GPT-5.5 is now rolling out to ChatGPT Plus, Pro, Business, and Enterprise users, with GPT-5.5 Pro available for higher-complexity tasks. API access is expected to follow, with pricing set at $5 per million input tokens and $30 per million output tokens, while Pro usage carries higher rates.
The model was trained using infrastructure developed in collaboration with Microsoft and NVIDIA, leveraging Azure data centers and GPU systems including H100, H200, and next-generation architectures.
Toward AI That Can Work End-to-End
For OpenAI, GPT-5.5 represents more than another incremental release. It signals a transition toward AI systems capable of carrying real work from start to finish.
“Everything is controlled by code,” the company said. “The better an agent is at reasoning about and producing code, the more capable it becomes across all forms of work.”
With GPT-5.5, OpenAI is betting that the future of AI will be defined not just by intelligence, but by execution — systems that can plan, act, and deliver outcomes at scale.