openclaw

OpenClaw’s Stale Codex Routing Bug Is a Model Fallback Problem Disguised as Auth Plumbing

Anatoliy Kolodkin

17 May 2026 • 4 min read

Model fallback is supposed to make agent systems more reliable. In OpenClaw issue #83349, it did the opposite: it made a runtime boundary failure look like a successful recovery. That distinction matters. When a user asks for “some answer,” fallback is a convenience. When an operator pins a bot to a specific model, provider, harness, auth profile, and approval mode, fallback becomes part of the security contract.

The issue was filed May 18 at 00:42 UTC after an upgrade from 2026.5.12 to 2026.5.16-beta.5. The affected Telegram direct session was pinned to openai/gpt-5.5 through the Codex harness, with agentRuntime.id = "codex", high thinking, and an explicit expectation of no fallback. After the live session hit compaction or context pressure, Gateway began throwing Requested agent harness "codex" is not registered. Then the model-fallback loop treated the failure as a candidate miss and moved to Claude Sonnet.

The reported logs tell the story in two lines: decision=candidate_failed requested=openai/gpt-5.5 candidate=openai/gpt-5.5 ... next=anthropic/claude-sonnet-4-20250514 detail=Requested agent harness "codex" is not registered, followed by decision=candidate_succeeded requested=openai/gpt-5.5 candidate=anthropic/claude-sonnet-4-20250514. From the system’s perspective, the bot recovered. From the operator’s perspective, the runtime silently violated the selected execution contract.

Codex was not just a model choice

This is the part model leaderboards usually miss. In an agent platform, “Codex” may mean more than model weights. It can imply a harness, tool-call semantics, OAuth state, approval behavior, context handling, native tool trajectories, sandbox assumptions, and integration-specific routing. Swapping from a Codex harness path to a Claude Sonnet candidate is not like substituting one text-completion endpoint for another during a transient outage. It can change how tools are invoked, what logs are produced, which auth profile is used, and which operational safeguards apply.

That is why the missing harness error should be typed as fail-closed. If the operator explicitly requested the Codex harness and that harness is not registered, the correct user-facing behavior is a loud failure with repair instructions. Install the plugin. Restore the config. Fix the active session metadata. Restart Gateway if required. What should not happen is a polite continuation from a different runtime family.

The manual recovery described in the issue is revealing. The operator had to reinstall the official Codex plugin with openclaw plugins install clawhub:@openclaw/codex, restore active config to the intended OpenAI/Codex route, repair live session metadata, and restart Gateway. After repair, the same Telegram session completed successfully on provider=openai, model=gpt-5.5, agentHarnessId=codex, and fallbackUsed=false. That is the contract the operator expected all along.

ClawSweeper’s source analysis reportedly confirmed the failure shape: strict Codex harness selection throws when codex is missing, and the outer model-fallback loop can continue on unrecognized errors when another candidate exists. That is a classic abstraction leak. The inner layer knows this is a required harness failure. The outer layer only sees “candidate failed; try next.”

Fallback needs typed failure classes

The fix is not to delete fallback. Fallback is useful. Rate limits happen. Provider outages happen. Quota exhaustion happens. Transient UNAVAILABLE responses happen. A runtime that can retry a compatible backend before emitting output can make user-facing agents much more reliable.

But fallback policy needs a taxonomy. Rate limit: maybe fallback, if operator policy allows it. Billing exhausted: maybe fallback, but probably notify. Provider unavailable before any output: possibly fallback. Missing explicitly selected harness: fail closed. Stale auth profile: repair or fail closed. Runtime policy mismatch: fail closed. Approval-mode mismatch: fail closed. Sandbox-policy mismatch: fail closed. Tool-schema incompatibility: fail closed unless the substitute runtime is proven compatible.

That taxonomy is not academic. The release train around beta.5 also includes PR #83345, which tackles a related stale-auth routing problem where PI OpenAI runs could switch to openai-codex based only on stale auth.order.openai entries or stored Codex backup profiles. The PR direction is sane: do not reroute PI OpenAI runs to Codex unless there is a valid eligible Codex OAuth profile; still allow valid Codex OAuth profiles to route through Codex. In other words, routing should depend on current, structured auth eligibility, not ghosts in an old auth order.

Put those two stories together and the pattern is clear. Agent runtimes are accumulating state: provider auth, harness registration, session metadata, model fallbacks, plugin installs, runtime defaults, channel bindings, compaction history, and upgrade migrations. Any one of those can go stale. If stale state is allowed to steer routing silently, the platform becomes hard to trust precisely when it appears resilient.

The operator checklist is boring and necessary

Teams running OpenClaw channel bots should treat upgrades as routing events, not just package updates. After upgrading, verify that required plugins and harnesses are registered. Confirm active session metadata still points at the intended provider, model, and harness. Inspect fallback arrays for sessions that are supposed to be pinned. Check auth-order entries and backup profiles for stale Codex or OpenAI state. Search logs for Requested agent harness "codex" is not registered, then look for nearby model-fallback/decision entries. That pair is the smell.

It is also worth testing compaction and long-session resume paths, because this issue surfaced after the live Telegram session hit context pressure. Many agent bugs do not appear in the first three turns. They appear after the system rotates threads, compacts history, resumes a stored runtime, or reconstructs state after a restart. If your acceptance test is “ask the bot one question after upgrade,” you are testing the lobby, not the building.

For builders comparing OpenClaw to Claude Code, Codex, Cursor, Aider, or custom orchestration stacks, this is the operational layer that deserves more attention. The interesting question is not only which agent writes the better patch. It is which platform preserves runtime identity across plugin disappearance, auth migration, compaction, channel delivery, and fallback. A model can be brilliant and still be unsafe to substitute when the harness contract changes.

The editorial take is simple: fallback policy is security policy now. In a text-only chatbot, “try another model” is usually harmless. In an agent runtime, it can cross tool, auth, logging, and approval boundaries. OpenClaw’s stale Codex routing bug is not embarrassing because something failed. Systems fail. It is embarrassing because the failure mode looked like success.

Sources: OpenClaw issue #83349, PR #83345, OpenClaw v2026.5.16-beta.5 release, issue #83093

Codex was not just a model choice

Fallback needs typed failure classes

The operator checklist is boring and necessary

Sign up for more like this.