Codex: AI Revolution – Will It Work? 🤖🤯

Tech & Science

The vast majority of Codex is built by Codex itself, OpenAI explained. Reflecting the growing popularity of AI coding tools among software developers, Codex’s adoption is now influencing nearly every stage of the development process, including the ongoing improvement of AI coding tools. “I think the vast majority of Codex is built by Codex, so it’s almost entirely just being used to improve itself,” stated Alexander Embiricos, product lead for Codex at OpenAI, during a conversation on Tuesday. Launched in May 2025 as a research preview, Codex operates as a cloud-based software engineering agent capable of handling tasks such as writing features, fixing bugs, and proposing pull requests. The tool executes within sandboxed environments linked to a user’s code repository and can simultaneously run multiple tasks. OpenAI provides Codex through ChatGPT’s web interface, a command-line interface (CLI), and extensions for VS Code, Cursor, and Windsurf. The name “Codex” originates from an earlier 2021 OpenAI model, based on GPT-3, which originally powered GitHub Copilot’s tab completion feature. Embiricos indicated that the name is rumored among staff to be short for “code execution.” OpenAI aimed to connect the new agent to this foundational moment, which was developed in part by individuals who subsequently left the company. “For many people, that model powering GitHub Copilot was the first ‘wow’ moment for AI,” Embiricos noted. “It demonstrated the potential of AI’s ability to understand a user’s context.”

“We’re trying to do and accelerate you in doing that,” Embiricos stated. It’s widely understood that the current command-line version of Codex shares similarities with Claude Code, Anthropic’s agentic coding tool, which debuted in February 2025. While Embiricos deflected direct questions about Claude Code’s influence on Codex’s design, he acknowledged a competitive landscape. “It’s a fun market to work in because there’s lots of great ideas being thrown around,” he remarked. He noted that OpenAI had been developing web-based Codex features internally prior to releasing the CLI version, which followed Anthropic’s tool launch. OpenAI’s customers have shown strong enthusiasm for the command-line version; Codex usage among external developers increased by 20 times following the shipment of the interactive CLI extension alongside GPT-5 in August 2025. OpenAI subsequently released GPT-5 Codex on September 15, 2025, a specialized version of GPT-5 optimized for agentic coding, further accelerating adoption. Internal adoption is equally strong, with Embiricos stating that the vast majority of OpenAI’s engineers now regularly utilize Codex. The company leverages the same open-source version of the CLI, accessible for external developers to download, suggest additions to, and modify. Notably, the version of Codex used internally is literally the open-source repository; we do not maintain a separate repository. This recursive development extends beyond basic code generation, with Embiricos describing scenarios where Codex monitors its…

OpenAI utilizes its own training runs and user feedback to determine its future development priorities. “We have designated areas where we task Codex with analyzing the feedback and subsequently deciding on our next steps,” a spokesperson explained. Codex is actively involved in generating research materials used during its own training, and the team is experimenting with having Codex monitor those training processes. OpenAI employees can also submit tasks to Codex through project management tools like Linear, effectively assigning it work in a manner analogous to assigning tasks to a human colleague. This recursive approach – using tools to build better tools – has deep roots in the history of computing. For example, engineers in the 1960s designed the first integrated circuits by hand on vellum and paper, then fabricated physical chips from these drawings. These chips powered the early electronic design automation (EDA) software, which, in turn, enabled engineers to design circuits far too complex for any human to draft manually. Modern processors contain billions of transistors arranged in patterns that are only possible through software. OpenAI’s use of Codex to build Codex exemplifies this pattern, with each generation of the tool creating capabilities that fuel the next. Essentially, a semi-autonomous feedback loop exists: Codex generates code under human direction, that code is incorporated into Codex, and the subsequent version of Codex produces different code based on this expanded dataset. Notably, the Sora Android app was developed entirely by four engineers from the ground up, taking approximately 18.

“We spent 28 days building it and shipped it to the app store,” he said. The engineers had already established the iOS app and server-side components, so their efforts centered on developing the Android client. They leveraged Codex to aid in architectural planning, generating detailed sub-plans for individual components, and ultimately implementing those components. Despite OpenAI’s assertions regarding Codex’s success internally, independent research has indicated mixed results for AI coding productivity. A METR study published in July revealed that experienced open source developers were, in fact, 19 percent slower when utilizing AI tools on complex, mature codebases—though the researchers conceded that AI may perform more effectively on simpler projects. Ed Bayes, a designer on the Codex team, described how the tool has transformed his workflow. “Codex now integrates with project management tools like Linear and communication platforms like Slack, allowing team members to assign coding tasks directly to the AI agent,” Bayes stated to Ars. “You can add Codex and assign issues to it now.” Bayes elaborated, “Codex is literally a teammate in your workspace.” This integration allows users to tag Codex within a Slack channel and request issue resolution; the agent will then generate a pull request, facilitating review and iteration through the same channel. “It’s essentially approximating this kind of coworker and showing up wherever you work,” Bayes concluded.

Design and interaction patterns for Codex’s interfaces have allowed him to contribute code directly, eliminating the previous need to hand off specifications to engineers. “It essentially gives you more leverage,” he explained, “enabling you to work across the stack and essentially do more.” Designers at OpenAI now prototype features by building them directly, utilizing Codex to handle the implementation details. OpenAI’s strategy views Codex as akin to what Bayes described as “a junior developer,” a role the company hopes the tool will evolve into a more senior developer over time. “If you were onboarding a junior developer, how would you onboard them? You give them a Slack account, you give them a Linear account,” Bayes noted. “It’s not just this tool that you go to in the terminal, but it’s something that comes to you as well and sits within your team.” Considering this integrated approach, a key question arises: what role will humans play? When asked, Embiricos distinguished between “vibe coding,” where developers accept AI-generated code without close review, and “vibe engineering,” a term coined by AI researcher Simon Willison, where humans remain actively involved. “We see a lot more vibe engineering in our code base,” he said. “You ask Codex to work on that, maybe you even ask for a plan first. Go back and forth, iterate on the plan, and then you’re in the loop with the model and carefully reviewing its code.” He added that vibe coding still has a place for prototypes and throwaway tools.

Over the past year, “monolithic” large language models (LLMs) such as GPT-4.5 have largely stalled in terms of frontier benchmarking progress, leading AI companies to shift their focus toward simulated reasoning models and agentic systems constructed from multiple AI models operating in parallel. When asked whether agents like Codex represent the optimal path for maximizing utility from existing LLM technology, Embiricos emphatically dismissed concerns about an AI capability plateau. “I think we’re very far from plateauing,” he stated. “If you look at the velocity on the research team here, we’ve been shipping models almost every week or every other week.” He cited recent advancements where GPT-5-Codex reportedly completes tasks 30 percent faster than its predecessor while maintaining the same level of intelligence. During testing, the company has observed the model independently working for 24 hours on complex tasks. OpenAI currently faces increasing competition in the AI coding market, with Anthropic’s Claude Code and Google’s Gemini CLI offering comparable terminal-based agentic coding experiences. This week, Mistral AI released Devstral 2 alongside a CLI tool called Mistral Vibe. Furthermore, startups like Cursor have developed dedicated IDEs centered around AI coding, reportedly generating $300 million in annualized revenue. Given the well-documented issues of “confabulation” within AI models when utilized as factual resources, could coding have become the key application for LLMs? OpenAI has…

“We have absolutely noticed that coding represents a clear and relatively safe business application for today’s AI models, particularly when compared to utilizing AI language models for writing or as emotional companions,” said Embiricos. “Agents are demonstrating the ability to achieve significant proficiency in coding, and there’s substantial economic value associated with this.” The team believes a focused approach on Codex is highly mission-aligned, allowing them to deliver considerable value to developers. “Developers build solutions for others, so we’re effectively scaling through them,” Embiricos added.

Despite concerns about the potential impact of tools like Codex on software developer jobs, Bayes acknowledged these worries. He noted that OpenAI has not experienced a reduction in headcount as a result of the tool, emphasizing that “there’s always a human in the loop because the human can actually read the code.” Neither Bayes nor Embiricos foresee a future where Codex operates autonomously without human oversight, viewing it instead as an amplifier of human potential rather than a replacement.

The practical implications of agents like Codex extend beyond OpenAI’s operations. Embiricos stated the company’s long-term vision includes making coding agents accessible to individuals without any prior programming experience. “Not all of humanity will open an IDE or even understand what a terminal is – a more general agent,” he explained. This article was updated on December 12, 2025 at 6:50 PM to include findings from the METR study.