OpenAI's GPT-5.5 Rate Card Makes the New Model Hierarchy Obvious: Frontier for Hard Work, Mini for Subagents

OpenAI's GPT-5.5 Rate Card Makes the New Model Hierarchy Obvious: Frontier for Hard Work, Mini for Subagents

OpenAI’s pricing page is doing more product strategy than most launch posts.

The current GPT-5-era API lineup makes the hierarchy hard to miss: GPT-5.5 for complex reasoning and professional coding work, GPT-5.4 for a cheaper middle tier, and GPT-5.4 mini for lower-latency coding, computer-use, and subagent workloads. That sounds like a normal model menu until you put the prices next to an agent architecture. GPT-5.5 costs $5 per million input tokens and $30 per million output tokens. GPT-5.4 mini costs $0.75 per million input tokens and $4.50 per million output tokens. The output-token spread is 6.67×.

That is not procurement trivia. That is a routing policy.

The frontier model is no longer the default model

OpenAI’s model docs describe GPT-5.5 as “a new class of intelligence for coding and professional work” and the company’s newest frontier model for the most complex professional tasks. It has the shape you would expect from a flagship agent model: a 1,050,000-token context window, 128,000 max output tokens, reasoning effort controls from none through xhigh, text and image input, streaming, function calling, structured outputs, web search, file search, image generation, code interpreter, hosted shell, apply patch, skills, computer use, MCP, and tool search.

That tool list is the tell. GPT-5.5 is not being positioned as a nicer chatbot. It is a control-plane model for work: plan, inspect, patch, execute, search, call tools, and operate inside agent frameworks. The same vocabulary now shows up across Codex, Claude Code, OpenCode, Cursor-style agents, and local stacks built around MCP. Model capability and agent harness design are collapsing into one product surface.

But a frontier model with a million-token context window is also a very efficient way to set money on fire if every subtask goes through it. A coding agent that sends routine grep synthesis, narrow edits, log triage, classification, dependency lookup, and test-output summarization to GPT-5.5 is not being “premium.” It is failing to route. GPT-5.4 mini is priced like the throughput layer because that is where it belongs: scoped subagents, cheap summaries, bounded code edits, pre-review checks, file-search interpretation, and other work where the harness can make the task small enough that judgment is less important than volume.

The long-context surcharge is the part teams will learn the hard way

The million-token context window is impressive. It is also not free memory. OpenAI’s pricing docs say prompts above 272K input tokens are priced at 2× input and 1.5× output for the full session across standard, batch, and flex. That threshold should change how teams design repo-scale agents. If your strategy is “stuff the repository into context because the window exists,” the pricing page is already warning you how that ends.

Long context is useful when the task genuinely needs it: architecture review across a large codebase, security analysis with many dependent files, migration planning where historical context matters, or final synthesis after a long investigation. It is wasteful when used as a substitute for retrieval, file maps, summaries, dependency graphs, and targeted reads. The right pattern is not “always use the biggest context.” It is “keep the working set small until the task proves it needs more.”

The cached-token discounts reinforce the same lesson. GPT-5.5 cached input is listed at $0.50 per million tokens versus $5 for normal input. GPT-5.4 mini cached input is $0.075 versus $0.75. Caching is a 10× discount, but only if the agent preserves stable prefixes and avoids invalidating context on every turn. Many agent frameworks accidentally destroy cacheability by spraying tool output, timestamps, reordered system context, or giant mutable scratchpads into the prompt. That turns a pricing feature into a theoretical feature.

Build the model hierarchy before usage builds it for you

The practical move is to write a routing policy now. Use GPT-5.5 for high-uncertainty planning, architecture decisions, security-sensitive review, ambiguous debugging, and final synthesis where the cost of a wrong answer is high. Use GPT-5.4 or GPT-5.4 mini for narrow execution: search, summarize, classify, apply small patches, explain known test failures, generate boilerplate, and run bounded subagent loops. Treat model selection as part of the agent design, not as a user preference buried in a dropdown.

This matters because agent systems amplify cost mistakes. A chat user may make one expensive call. An agent can make 200 calls, fork 20 subagents, re-read the same files, cross the long-context threshold, and then retry after a tool failure. The difference between a sensible hierarchy and “frontier model everywhere” is not a rounding error. It is the difference between a useful internal platform and a budget incident with a nice UI.

Teams should measure the routing layer the same way they measure latency and correctness. Track which model handled each step, input and output tokens, cached-token ratio, threshold crossings above 272K tokens, tool-call retries, task success rate, human intervention rate, and cost per completed task. Do not optimize for cheapest call. Optimize for cheapest successful workflow. A mini model that fails three times before escalating may be more expensive than sending the hard part to GPT-5.5 immediately.

There is also a governance point hiding in the rate card. Data residency and regional processing carry a 10% uplift for GPT-5.5. Long-context sessions are priced differently. Realtime audio has its own token economics, with GPT-Realtime-2 listed at $32 per million audio input tokens and $64 per million audio output tokens, plus minute-priced translate and whisper offerings. Once agents span code, voice, browser, shell, and internal files, cost policy becomes security policy’s boring sibling. You need to know where data goes, which model sees it, what it costs, and who approved the escalation.

The old model-selection question was “which model is best?” The new question is “which model should touch this step of this workflow under this budget, latency target, and risk profile?” OpenAI’s GPT-5.5 pricing makes that shift explicit. Frontier models are for judgment. Mini models are for throughput. Cache discipline is not optional. The teams that internalize that will ship agent systems that survive contact with real usage. The teams that do not will discover that autonomy scales faster than finance approvals.

Sources: OpenAI API pricing, OpenAI GPT-5.5 model docs, OpenAI model selection docs, OpenAI all-models docs