agentic-coding

Codex Pricing Turns Agentic Coding Into a Token-Budgeting Problem

Anatoliy Kolodkin

14 May 2026 • 5 min read

The most useful thing OpenAI did with its updated Codex pricing page was make the fantasy expensive in public.

For the last year, agentic coding has been sold with a suspiciously smooth story: delegate the task, let the model work, get a pull request back. The pricing mechanics were always underneath that demo, but they were easy to ignore because the unit of consumption felt like “a prompt.” Codex’s current pricing surface makes the real unit harder to miss. The cost is the whole working set: repo context, tool output, generated code, cached input, MCP schemas, screenshots, traces, browser state, cloud tasks, code review, image generation, and whatever your team stuffed into AGENTS.md six months ago.

Agentic coding is now a token-budgeting problem. That is healthy. Engineering teams make better decisions when the meter is visible.

The five-hour window is a product-management opinion

OpenAI’s page lays out plan access, local-message ranges, cloud-task limits, code-review limits, model availability, API-key tradeoffs, and credit rates. Plus includes Codex on web, CLI, IDE extension, and iOS; cloud integrations such as automatic code review and Slack; recent models including GPT-5.5, GPT-5.4, and GPT-5.3-Codex; and GPT-5.4-mini for higher-usage routine messages. Pro adds multipliers: OpenAI says Pro $100 gets 2x Codex usage through May 31, 2026 — 10x Plus instead of the standard 5x — while Pro $200 keeps 20x Plus ongoing and temporarily honors 25x Plus five-hour Codex limits through May 31.

The shape of those five-hour windows is the first real signal. GPT-5.5 gets 15–80 local messages and no cloud tasks or code reviews. GPT-5.4 gets 20–100 local messages. GPT-5.4-mini gets 60–350 local messages. GPT-5.3-Codex gets 30–150 local messages, 10–60 cloud tasks, and 20–50 code reviews. Local messages and cloud tasks share the same window.

That is not just accounting. It is a product-management opinion about which model belongs where. GPT-5.5 may be the high-judgment model, and OpenAI says it uses “significantly fewer tokens” than GPT-5.4 for comparable results, but it is not the cloud-task workhorse in the published limits. GPT-5.3-Codex is the one explicitly attached to code review and cloud execution. GPT-5.4-mini is the volume lane. The pricing page is quietly telling teams to route work instead of defaulting everything to the biggest model in the menu.

Output tokens are where lazy routing gets punished

The rate card makes the routing issue concrete. GPT-5.5 is listed at 125 credits per 1M input tokens, 12.50 credits per 1M cached input tokens, and 750 credits per 1M output tokens. GPT-5.4 is 62.50 / 6.250 / 375. GPT-5.4-mini is 18.75 / 1.875 / 113. GPT-5.3-Codex is 43.75 / 4.375 / 350. Code review uses GPT-5.3-Codex.

The output-token spread is the number engineering managers should stare at. Coding agents produce a lot of output: patches, explanations, plans, test logs, reasoning summaries, review comments, alternate approaches, migration scripts, and “I tried this and it failed” reports. If a team runs mechanical edits through the most expensive output lane, the bill is not a mystery. It is a choice.

The sane policy looks like compute scheduling. Use the strongest model when judgment, ambiguity, architecture, or security review matters. Use smaller models for routine edits, formatting, obvious tests, dependency bumps, doc cleanup, and narrow migrations. Use cloud tasks when the work benefits from asynchronous execution, not because the button is there. Keep code review on the model designed and priced for that job. Senior engineers already do this with CI runners, database instances, and observability retention. Agents deserve the same discipline.

OpenAI estimates average Codex usage at roughly $100–$200 per developer per month, with large variance depending on model choice, number of instances, automation, and fast mode. That range is believable because agent cost scales with behavior, not headcount alone. One developer running narrow local edits can be cheap. One developer launching parallel cloud tasks against large repos with verbose instructions and broad tool context can burn like a tiny platform team.

MCP clutter is now a cost smell and a security smell

OpenAI’s usage-saving advice is unusually practical: control prompt size, reduce AGENTS.md size, limit MCP servers because every MCP server adds context, and switch to smaller models for routine tasks. The MCP point deserves more attention than it will get.

Every MCP server adds tool descriptions, schemas, permissions, and often implicit assumptions about what the agent can do. That increases cost because the agent has more context to carry. It can reduce quality because the model has more irrelevant affordances to consider. It also increases attack surface because every tool boundary is a place where prompt injection, overbroad authorization, or stale configuration can matter. If your default agent config includes five servers it rarely uses, you are paying rent on clutter and leaving extra doors unlocked.

The fix is not complicated. Maintain a small default MCP set. Enable specialized servers per task or per repository. Remove abandoned servers from shared configs. Pin versions where possible. Document what each server is allowed to touch. If the agent needs browser automation for a front-end performance task, enable that. If it is updating a README, it probably does not need your issue tracker, database inspector, browser profile, and cloud console in context.

AGENTS.md needs the same cleanup. Persistent instructions are now both documentation and FinOps infrastructure. A 32 KiB instruction blob full of stale build commands, duplicated style guidance, and “always run the full suite” folklore is not harmless. It costs money every time it enters context, and it teaches the agent to do expensive or wrong things. Keep global guidance short, move local rules into nested directories, and audit instructions when agent bills surprise you.

The API-key path adds another tradeoff. OpenAI says API-key usage works for Codex in CLI, SDK, or IDE extension, but it does not include cloud-based features such as GitHub code review or Slack integration and may get delayed access to newer models such as GPT-5.3-Codex and GPT-5.3-Codex-Spark. That is a reasonable split: API keys are flexible, but the managed surface gets the product integration. Teams should choose intentionally instead of assuming “API” means “same thing, more enterprise.”

The editorial takeaway is not that Codex is too expensive. The takeaway is that coding agents are no longer too abstract to manage. Put model routing in your internal playbook. Give agents smaller default contexts. Track usage by repo and workflow, not just by person. Treat MCP servers as scoped dependencies. Review AGENTS.md like CI configuration. The teams that do this will get more useful automation per dollar. The teams that treat agents like infinite interns will get infinite-intern invoices.

Sources: OpenAI Developers, OpenAI Codex rate card, OpenAI AGENTS.md guide, OpenAI Codex security

The five-hour window is a product-management opinion

Output tokens are where lazy routing gets punished

MCP clutter is now a cost smell and a security smell

Sign up for more like this.