Artificial Intelligence is significantly transforming software development, automating the more labor-intensive aspects of programming through networks of agents and subagents. However, as developers explore innovative interfaces and collaboration methods between humans and AI, even the leading AI laboratories are finding it challenging to keep pace.
The prevailing trend is toward agent-driven software development—platforms where AI agents can autonomously undertake coding tasks—epitomized by applications like Claude Code and Cowork. Concurrently, OpenAI has been progressively enhancing its Codex tool, which was initially introduced as a command-line interface last April and subsequently expanded into a web application a month later.
Recently, OpenAI made a significant advancement by releasing a new macOS application for Codex on Monday. This app incorporates many of the agent-driven functionalities that have gained traction over the past year. Designed for parallel processing with multiple agents, it seamlessly integrates agent capabilities and cutting-edge workflows. This release follows closely behind the introduction of GPT-5.2-Codex, OpenAI’s most sophisticated coding model, which the company anticipates will attract users from Claude Code.
“For tackling complex and sophisticated tasks, GPT-5.2 is undoubtedly the most advanced model available,” stated CEO Sam Altman during a press call. “However, its usability has been a challenge. By translating that high level of capability into a more adaptable interface, we believe we will significantly enhance user experience.”
While Altman’s confidence in GPT-5.2 is warranted, the coding benchmarks present a more nuanced narrative. Currently, GPT-5.2 ranks first on TerminalBench—a test assessing AI’s command-line programming efficacy—but Gemini 3 and Claude Opus agents have achieved comparable scores, albeit slightly lower and within the error margin of the benchmark. Results from SWE-bench, which evaluates AI’s capability to rectify real-world software bugs, reveal no decisive advantage for GPT-5.2. Nevertheless, quantifying agent-driven applications effectively remains a challenge, and user experiences with state-of-the-art models can vary considerably.
The Codex app is equipped with a variety of new features that OpenAI claims will either match or surpass the performance of various Claude applications. It facilitates automations that can operate in the background on an automated schedule, with results queued for user review upon their return. Additionally, users can select from different agent personas—ranging from pragmatic to empathetic—tailoring the experience to their preferred working style.
For OpenAI, the primary advantage lies in the remarkable speed of development enabled by AI technology. “You can start from scratch and produce a sophisticated software solution in just a few hours,” Altman remarked. “The pace at which I can input new ideas delineates the extent of what can be created.”
TechCrunch Event
Boston, MA
|
June 23, 2026
