codex

Codex Pricing Just Became Token Accounting — Which Means Agent Cost Is Now an Engineering Problem

Anatoliy Kolodkin

13 May 2026 • 5 min read

OpenAI just made Codex pricing less mysterious and more annoying in exactly the way serious engineering teams should want. The new Codex rate card moves the conversation away from fuzzy “messages,” “tasks,” and “included usage” toward credits per million input tokens, cached input tokens, and output tokens. That is more honest. It is also a quiet admission that agentic coding cost is now an architecture problem, not a subscription-plan footnote.

The headline number OpenAI gives elsewhere is that Codex averages roughly $100–$200 per developer per month, with large variance by workload. The rate card explains why the variance matters. GPT-5.5 is listed at 125 credits per 1M input tokens, 12.50 credits per 1M cached input tokens, and 750 credits per 1M output tokens. GPT-5.4 comes in at 62.50 / 6.250 / 375. GPT-5.4-mini is 18.75 / 1.875 / 113. GPT-5.3-Codex and GPT-5.2 both show 43.75 input, 4.375 cached input, and 350 output credits per million tokens.

That table is not just billing metadata. It is a map of where engineering discipline will matter. If your agent repeatedly drags a large repository, verbose instructions, test logs, screenshots, tool schemas, and irrelevant docs into context, you pay for that. If it produces long diffs, long rationales, and long retry loops, you pay for that too. If your workflow is cache-friendly — stable repo instructions, repeated context, scoped tasks, fewer noisy tool surfaces — the cached-input column becomes real money.

The cached-input column is the tells-on-you column

Caching is the most important line in the table because it turns prompt hygiene into a measurable engineering variable. GPT-5.5 cached input is priced at one-tenth of fresh input. GPT-5.4-mini follows the same pattern. OpenAI is effectively saying: if you make the model reread the same durable context efficiently, the system can reward you; if every task is a bespoke blob of unstructured repo soup, that is on you.

This is where coding-agent teams need to get less romantic. Huge root-level AGENTS.md files, always-on MCP servers, sprawling tool permissions, and prompts that ask the agent to “look around” without constraints are now cost smells. They were already reliability smells. A human can skim past a noisy document. An agent tends to tokenize it, reason over it, and sometimes obey the wrong part. The pricing model simply makes the waste visible enough for finance to notice.

Teams should start treating context like they treat query plans. Keep global instructions short. Put detailed guidance close to the directories where it applies. Disable tool servers that are not needed for the task. Prefer explicit file lists, issue links, test commands, and acceptance criteria over open-ended exploration. If an agent needs to solve the same class of problem repeatedly, turn that workflow into a skill, template, or cached instruction path instead of reprompting from scratch every time.

Output-heavy agents are where the bill hides

The rate card also undercuts the comforting myth that “input context” is the whole cost story. Output tokens are substantially more expensive across the listed models. GPT-5.5 output is 750 credits per 1M tokens, six times its fresh-input rate. GPT-5.4 output is 375. GPT-5.3-Codex output is 350. That matters because coding agents are output machines: patches, explanations, plans, review comments, test logs, generated files, migration scripts, and retry narratives.

A background refactor that generates three candidate implementations, comments every file, writes a novel-length final summary, fails tests twice, then prints the entire failing log into context is not the same economic object as a one-turn local edit. The old “credits per message” framing made those look artificially comparable. Token accounting makes them obviously different.

The model ladder now needs policy. GPT-5.5 may be the quality default, and OpenAI says it can use fewer tokens than GPT-5.4 for comparable results, but the rate card still makes output-heavy GPT-5.5 work expensive in credit terms. GPT-5.4-mini is the obvious candidate for mechanical edits and routine local messages. GPT-5.3-Codex is OpenAI’s cloud-task and code-review workhorse. GPT-5.3-Codex-Spark remains a research preview with non-final rates, which means it should not be the basis for budget promises yet.

The right move is not “always use the cheapest model.” Cheap bad output is expensive once a senior engineer has to unwind it. The right move is routing: frontier model for ambiguous architecture and hard bugs; cheaper model for mechanical changes and local chores; Codex-specialized model for cloud tasks and review; fast mode only when latency is worth the burn. That routing should be documented, measured, and enforced through defaults where possible.

Usage windows are not a budget strategy

OpenAI’s Codex pricing docs still describe five-hour usage windows: GPT-5.5 supports roughly 15–80 local messages per five hours, GPT-5.4 supports 20–100, GPT-5.4-mini supports 60–350, and GPT-5.3-Codex supports 30–150 local messages, 10–60 cloud tasks, and 20–50 code reviews. Those ranges are useful for setting expectations, but they are not governance. They are guardrails around a system whose real consumption depends on context size, output volume, cache hit rate, images, model choice, parallelism, and retry behavior.

The Pro-tier details make the transition sharper. Plus and Pro users can buy additional credits after included limits, eligible users can enable auto top-up, credits are valid for 12 months, and credits are non-refundable except where required by law. Pro $100 currently has a Codex usage promo through May 31, 2026: 10x Plus instead of the standard 5x. Pro $200 keeps 20x Plus ongoing and temporarily honors 25x Plus five-hour Codex limits through May 31. That is product packaging, not an operating model.

The community anxiety around GitHub Copilot’s billing preview is relevant even though it is not OpenAI pricing. Heavy users in GitHub’s discussion reported projected spend estimates in the thousands — one cited $2,784 — and pushed back on whether token math, caching, and limits were understandable enough to manage. The shared lesson is obvious: once coding agents move from subscription vibes to usage accounting, teams need budgets and dashboards before the experiment becomes a finance surprise.

For engineering managers, the playbook is straightforward. Run a representative week of tasks across GPT-5.5, GPT-5.4, GPT-5.4-mini, and GPT-5.3-Codex. Track tokens, credits, elapsed time, human review time, test pass rate, rework rate, and whether the agent stopped at the right point. Do not optimize purely for token cost; optimize for cost per accepted change. A cheap agent run that creates a misleading patch and burns an hour of review is not cheap.

Also put policy around automation. If Codex can be invoked from web, CLI, IDEs, Slack integrations, cloud tasks, code review, browser flows, and image-generation paths, usage cannot be governed by “please be reasonable.” Set per-user and per-repo alerts. Require owners for background agents. Disable fast mode by default unless a task class justifies it. Keep auto top-up in admin hands. Review long-running automations the same way you review cloud spend: with thresholds, owners, and boring recurring reports.

The deeper shift is cultural. Agent cost used to feel like procurement: pick a plan, assign seats, move on. Token-based Codex pricing makes it feel more like performance engineering. Context hygiene, cache efficiency, model routing, approval policy, and workflow shape all affect the bill. That is not bad news. It means engineering teams can actually improve the economics instead of arguing about whether the subscription is “worth it” in the abstract.

My take: this is the right kind of uncomfortable transparency. The future of agentic coding will not be won by the team that buys the most credits. It will be won by the team that designs agent workflows like production systems: observable, bounded, cache-aware, model-routed, and dull enough to survive contact with billing.

Sources: OpenAI Help Center — Codex rate card, OpenAI Developers — Codex pricing, OpenAI Help Center — flexible ChatGPT credits, OpenAI Help Center — ChatGPT Pro tiers, GitHub Community — Copilot Billing Preview

The cached-input column is the tells-on-you column

Output-heavy agents are where the bill hides

Usage windows are not a budget strategy

Sign up for more like this.