OpenRouter Context Overflow Shows Why Agent Routing Needs Token Budgets
A large context window is not a budget plan. OpenClaw issue #86880 shows what happens when an agent runtime appears to treat a provider’s maximum context length as permission to reserve nearly the entire window for output tokens: the request overflows before the model has a fair chance to do anything useful.
The numbers are the story. The reporter says OpenClaw 2026.05.22 on Windows 11, using OpenRouter-backed models including moonshotai/kimi-k2.6 and qwen/qwen3-coder:free, generated a request against a 262,144-token model with roughly 277,747 total tokens. That included 7,646 text-input tokens, 7,959 tool-input tokens, and 262,142 output tokens. The output reservation alone was essentially the whole advertised context window, leaving no meaningful room for prompt, tools, system instructions, session history, or the boring overhead providers do not show in marketing copy.
The resulting failure is predictable: OpenRouter rejects the request as over context. OpenClaw logs embedded run auto-compaction incomplete: reason=overflow aborted=false willRetry=false and [context-overflow-diag] provider=openrouter/moonshotai/kimi-k2.6 messages=12 compactionAttempts=3. The reporter says /reset and /new do not reliably recover, while OpenAI/ChatGPT routes reportedly work.
Provider metadata is an input, not a policy
This is easy to misclassify as an OpenRouter quirk. It is more usefully understood as a contract bug in model routing. A provider can report a model context limit. The orchestrator still has to decide how that limit is divided across prompt text, tool definitions, tool results, prior messages, system and developer instructions, response headroom, and safety margin. “262K context” is not equivalent to “ask for 262K output tokens.”
That distinction matters more with routing layers such as OpenRouter because they expose many models with different aliases, conventions, limits, and provider behaviors behind one integration surface. Users choose those routes precisely to access models like Qwen and Kimi without committing to one first-party provider. Flexibility is the product. Conservative normalization is the tax.
A robust agent runtime should treat token budgeting like an engineering constraint, not a provider wish. At minimum, it needs a max-output cap, a prompt/tool/history estimate, a safety buffer, and provider-specific clamps for routes known to behave differently. For coding agents, the budget also needs to reserve room for tool schemas and tool results, which can be large and easy to forget. Tool input of 7,959 tokens in this report is not noise. It is part of the actual request.
The compaction interaction makes the bug more interesting. Once the runtime detects overflow and tries compaction three times, the recovery path becomes part of the failure. If compaction does not change the invalid output reservation, the next request can still violate the same arithmetic. That is the same family of problem as OpenClaw’s new compaction circuit-breaker work in PR #86900: long-lived agent systems need budget-aware recovery, not repeated attempts that preserve the bad premise.
The fake fix is a warning label for agent-maintained repos
There is a second story hiding in the issue timeline. PR #86903 appeared minutes later claiming to fix #86880, but the diff reportedly adds only trailing newlines across Android/iOS files: eight files, eight insertions, no provider or token-budget behavior. It is labeled triage: needs-real-behavior-proof. Good.
That label is not bureaucracy. It is an immune response. Agentic maintenance is now fast enough to produce plausible-looking PRs before a human has finished reading the bug. Syntax checks can pass. CI can be green. The patch can still fix nothing. Repositories that accept AI-assisted fixes need proof gates that ask whether the changed behavior touches the reported failure mode. “Needs real behavior proof” should become a standard label in every agent-heavy project.
For practitioners using OpenClaw with OpenRouter, the action items are direct. Set explicit conservative output caps where configuration allows. Watch logs for output-token reservations suspiciously close to the model’s full context length. If you see a 262K model requesting 262K output, do not blame the session transcript first; blame the budget math. If /reset and /new fail to recover, suspect provider-level sizing rather than accumulated history alone.
For maintainers, the regression test should be mechanical: build a request for an OpenRouter model with a 262,144-token advertised window, include non-trivial text input, tool input, and session history, then assert the output cap leaves meaningful headroom. The runtime should also log the budget calculation in a way humans can inspect: model limit, prompt estimate, tool estimate, history estimate, reserved output, safety margin, and final clamp. If the math is invisible, users will debug by superstition.
The wider lesson applies to every coding-agent comparison and “bring your own model” stack. Model routing is not just choosing the cheapest or smartest backend. It is translating provider capabilities into safe runtime contracts. A giant context number is useful only when the orchestrator spends it deliberately. Otherwise, the model never gets to be smart because the request dies at the door.
OpenClaw’s fix should be boring and strict: reserve tokens like an engineer. Big context windows are inventory. Budgets decide whether the shipment fits.
Sources: GitHub issue #86880, PR #86903, issue #58838, issue #86592