The AI Coding Stack Is Fragmenting Into Layers, Not Winners
The most misleading question in AI coding right now is still the one vendors most want you to ask: which tool wins? Cursor, Claude Code, Codex, Copilot CLI, Gemini CLI, take your pick. It makes for clean benchmark posts and cleaner pricing pages. It is also increasingly the wrong mental model.
The better way to read this market is as a stack that is splitting into layers. One layer owns the editor surface and visual orchestration. Another owns long-running execution in the terminal. Another owns cloud delegation, review, or policy. The interesting part is not that vendors are copying each other. It is that power users have already stopped waiting for a single winner and are quietly assembling workflows from multiple tools, because no one product is best at every part of the job.
That is the core argument in The New Stack’s latest analysis, and it matches the direction of the official product moves we have seen over the last few weeks. GitHub pushed Copilot CLI to general availability in February, then added bring-your-own-key and local-model support on April 7, which makes the tool look less like a fixed SaaS endpoint and more like an agent shell teams can route through their own infra. OpenAI, meanwhile, has been broadening Codex beyond a single interface, with pricing and product docs that now span the web app, CLI, IDE extension, iOS, GitHub code review, and Slack integrations. Those are not the moves of companies building one chat box. They are the moves of companies trying to own different layers of a developer workflow.
The category is maturing past benchmark theater
Benchmarks still matter, but not in the way the market pretends. A model that scores higher on a coding eval is useful information. It is not the whole buying decision once the actual work involves repo discovery, permission boundaries, branch management, review, cost control, and the boring but decisive question of whether a team can live with the product every day.
That is why the current product split feels more structural than cosmetic. Cursor still has the clearest story for people who want a visible, editor-native experience and fast iteration inside a GUI. Claude Code has earned its reputation for going deep on long tasks in a terminal-first workflow. Codex remains the lower-friction OpenAI entry point because it shows up across more official surfaces and because OpenAI keeps packaging it as something teams can adopt without redesigning their entire development setup.
None of those strengths cancel the others out. They describe different jobs.
The market is not converging on one best tool. It is converging on a division of labor.
GitHub’s BYOK move gave away the plot
If you want the cleanest proof that this stack interpretation is right, look at GitHub’s April 7 Copilot CLI update. The company added support for Azure OpenAI, Anthropic, OpenAI-compatible endpoints, and fully local options like Ollama, vLLM, and Foundry Local. It also made offline mode possible with COPILOT_OFFLINE=true and removed the requirement to log in with GitHub when you are using your own provider.
That is a big philosophical shift disguised as a changelog entry. GitHub is effectively saying the product value is no longer only “use our hosted model routing.” It is also “use our workflow, permissions, and terminal UX even if you want someone else’s model.” In infrastructure language, Copilot CLI is becoming a control plane.
That matters because it changes what competition looks like. If the shell around the model gets sticky enough, model allegiance becomes less absolute. Teams may end up standardizing on one workflow surface while swapping models underneath based on cost, capability, or compliance. That is not a theoretical edge case anymore. Vendors are shipping exactly toward it.
Codex is quietly betting on surface area
OpenAI’s answer looks different, but the logic rhymes. The Codex pricing page now frames the product across a surprisingly broad set of surfaces: web, CLI, IDE extension, iOS, GitHub code review, and Slack. The model lineup also reveals product segmentation. GPT-5.4 is positioned for heavier local work, GPT-5.4-mini stretches usage for routine tasks, and GPT-5.3-Codex handles cloud tasks and code review. Those are not just model choices. They are workload lanes.
The practical implication is that OpenAI is not asking users to think about Codex as “the terminal agent” or “the code review feature.” It wants Codex to be the engineering layer that shows up wherever the work happens. That is a strong strategy if you believe the next phase of adoption is less about enthusiasts choosing a favorite CLI and more about teams normalizing agent help across several touchpoints.
It also gives Codex a different kind of advantage. Not necessarily the deepest autonomy on every task, and not necessarily the best editor experience, but a smoother handoff between casual use and serious use. A developer can start in a browser, move to the CLI, use GitHub review hooks, and push some workflows into automation without learning an entirely different product family each time.
Low friction is not sexy, but it ships.
What engineers should actually do with this
If you are evaluating AI coding tools for yourself or a team, stop asking which product is universally best. Ask which layer you want each product to own.
First, separate planning from execution. Some tools are better at helping you inspect a repo, sketch a plan, and keep context visible. Others are better at taking a scoped task and grinding through it for half an hour without getting distracted. Treat those as different capabilities, because they are.
Second, audit your trust boundaries. A cloud agent that can open branches and run checks is useful, but it also changes your threat model. A local agent with offline support may be slower to adopt in some teams, but it can be the difference between “interesting demo” and “approved for production use.” BYOK and local-model support are not bonus features anymore. They are procurement features.
Third, model your spend before people get attached. OpenAI’s current Codex pricing makes it very clear that usage depends on model choice, cloud tasks, code review volume, and whether you lean on faster modes. GitHub’s own moves point the same way: once these tools become real workflow engines instead of autocomplete, they start consuming budget like infrastructure. If you do not decide in advance which tasks deserve a premium model, the finance conversation will decide for you later.
Finally, optimize for legibility. The best tool for your team may not be the one with the highest benchmark score. It may be the one that makes state visible, keeps approvals sane, produces artifacts other humans can review, and fails in understandable ways. Developers forgive limits. They do not forgive mystery.
The winner may be the company that admits no one wants a monolith
The industry still talks like AI coding will collapse into one assistant that plans, writes, reviews, debugs, deploys, and maybe makes coffee. More likely, it looks like every other mature software category. The stack gets layered. The interfaces specialize. Vendors expand sideways. Teams standardize where they need consistency and mix tools where they need leverage.
That is not a sign the category is failing. It is a sign it is getting real.
The New Stack piece gets the headline right: this market is fragmenting into layers, not winners. The next useful comparison posts will not just ask whether Cursor beats Claude Code or whether Codex beats Copilot. They will ask which product owns planning, which owns execution, which owns review, and which one your security team will actually tolerate.
That is a much less marketable story than “one tool to replace them all.” It also happens to be the story builders can use.
Sources: The New Stack, OpenAI Codex Pricing, GitHub Copilot CLI GA, GitHub Copilot CLI BYOK and local models