OpenAI Releases Agentic Codex Model Capable of 24+ Hour Autonomous Work

The new model is now available to users subscribed to ChatGPT Plus, Pro, Business, Edu, and Enterprise plans.

OpenAI has launched GPT-5.1-Codex-Max, an advanced agentic coding model built for long-running software development tasks, and made it available across all Codex platforms.

According to OpenAI, the model is built on an improved reasoning foundation and trained on a wide range of agentic tasks spanning software engineering, math, research, and more. It is also the company’s first system designed to operate across multiple context windows using a technique called compaction, allowing it to stay coherent over millions of tokens within a single task.

OpenAI stated that the model can operate independently for extended periods, noting that internal tests showed Codex-Max continuously iterating on its code, resolving test failures, and ultimately completing tasks that ran for more than 24 hours.

The new model is available to users on ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. Developers using the Codex CLI with an API key will gain access once API support launches. GPT-5.1-Codex-Max will also replace GPT-5.1-Codex as the default model across all Codex interfaces.

OpenAI noted that 95% of its internal engineering team uses Codex every week, and engineers have been “shipping roughly 70% more pull requests since adopting Codex.

Higher Accuracy and Improved Token Efficiency

GPT-5.1-Codex-Max delivers significant improvements in real-world and benchmark coding tests. On SWE-Lancer, it achieved 79.9% accuracy, up from 66.3% with GPT-5.1-Codex. On SWE-bench Verified, the model reached higher accuracy at the same reasoning depth while using 30% fewer thinking tokens.

OpenAI said these efficiency improvements directly reduce costs for developers. In one test, the model produced a complete browser-based CartPole reinforcement learning sandbox using 27,000 thinking tokens, compared with 37,000 tokens required by the previous Codex model.

The company is also rolling out a new extra-high reasoning mode for tasks that aren’t sensitive to latency, enabling the model to take more time to think before generating a response.

Long-Horizon Performance and Windows Compatibility

Thanks to compaction, GPT-5.1-Codex-Max can now manage long-horizon tasks such as complex refactoring, multi-hour debugging, and extended agent loops that previously broke due to context limits. It is also the first Codex model trained to run within Windows environments. OpenAI has additionally added new tasks aimed at improving collaboration within the Codex CLI.

Safeguards and Cybersecurity

OpenAI stated that GPT-5.1-Codex-Max does not meet the “High” cybersecurity capability tier in its Preparedness Framework. However, it is still the most advanced and capable cybersecurity model the company has released to date

OpenAI said it is developing additional safeguards as agentic capabilities advance and noted that it has already intercepted attempts to misuse its models in cyber operations.

Codex operates in a restricted sandbox environment by default, offering only limited file access and no network connectivity unless explicitly allowed. OpenAI advises maintaining these restrictions to reduce exposure to prompt-injection and related security risks.

OpenAI emphasized that Codex is meant to complement human reviewers, not replace them, and urged developers to carefully review all AI-generated changes before deploying them.