OpenClaw's Codex OAuth Repair Loop Shows Why Agent Config Migration Needs Real Rollback Semantics

OpenClaw's Codex OAuth Repair Loop Shows Why Agent Config Migration Needs Real Rollback Semantics

The most dangerous repair tool is the one users trust when they are already confused. That is why OpenClaw’s latest Codex OAuth routing fix matters more than the narrow provider-name diff suggests.

PR #79569 targets a sharp edge in openclaw doctor --fix: hosts that only had usable Codex OAuth routing were still seeing working openai-codex/gpt-5.5 routes rewritten into openai/gpt-5.5 plus agentRuntime.id="codex". On paper, that looks like normalization. In production, it can turn a working agent lane into a route that points at the wrong auth path.

The linked issue, #79461, says seven active agent lanes on openai-codex/gpt-5.5 were rewritten by openclaw doctor --fix --yes on OpenClaw 2026.5.7. The environment had the Codex plugin enabled, openai-codex:<account> OAuth as the only usable GPT-5.5 provider, no direct OpenAI API-key provider, and desired routes with fallbacks=[] and agentRuntime=null. In that setup, preserving the auth profile is not enough. The provider/model reference itself is part of the working route.

Provider IDs are operational facts, not cosmetic strings

The category mistake is treating openai/gpt-5.5 and openai-codex/gpt-5.5 as interchangeable because they look like aliases for a similar model family. They are not interchangeable if one path is backed by a direct OpenAI API key and the other by Codex OAuth. The model name is only one field in a larger execution contract: provider ID, auth profile, runtime harness, plugin readiness, fallback chain, approval behavior, and billing surface all travel together.

That matters because agent configuration is executable infrastructure. A bad model route is not like a typo in a preferences file. It determines which backend receives prompts, which credentials are used, which approval layer is active, and whether an agent can run at all. If a repair command rewrites that route without proving the target is usable, it has performed a deploy.

PR #79569 appears to move in the right direction. The proof in the PR says openclaw doctor --fix --non-interactive preserved openai-codex/gpt-5.5 in defaults, per-agent config, and model maps, and did not create models.providers.openai when OPENAI_API_KEY was unset. A follow-up live command returned OC79461_FIX_OK with winnerProvider: openai-codex, winnerModel: gpt-5.5, agentHarnessId: codex, authMode: auth-profile, and fallbackUsed: false. That is the right proof shape because it validates the full execution chain, not just the JSON rewrite.

Doctor commands need rollback semantics

The deeper issue is not whether one route rewrite was wrong. It is that “doctor” commands occupy a privileged UX slot. Users run them when the system is broken, when documentation is ambiguous, or after an upgrade changed behavior. That means a doctor command should be conservative by default. It should preserve known-working intent, explain proposed changes before applying them, and make rollback cheap.

OpenClaw has been moving quickly through the GPT-5.5 / Codex parity and OAuth repair line. Version 2026.5.7 already tried to preserve working openai-codex/* routes during doctor repair and recover 2026.5.5-rewritten openai/* GPT-5 routes when only Codex OAuth auth was available. The fact that a follow-up PR is still needed is not scandalous. It is normal migration pain. But it should harden the product’s stance: config repair should be tested against live failure reports, not just schema ideals.

For operators, the practical move is boring and worth doing. Snapshot openclaw.json before running doctor --fix on any Codex-OAuth-only host. Prefer dry-run output when available. After repair, verify the actual winning route, not just that config validation passes: winnerProvider, winnerModel, authMode, agentHarnessId, and whether a fallback was used. If the only usable provider is openai-codex, a new openai provider block should be treated as suspicious unless you deliberately added direct OpenAI credentials.

For maintainers of agent platforms, the lesson generalizes. Migration tools should distinguish between obsolete config, ambiguous config, and working-but-noncanonical config. Only the first class should be rewritten automatically. Ambiguous config deserves a warning. Working-but-noncanonical config should usually be preserved unless the tool can prove the new form is equivalent in the user’s environment.

This is especially important as agent runtimes adopt ACP harnesses, provider plugins, OAuth-backed routes, local models, and subscription entitlements. The old mental model — “model ID maps to endpoint” — is too small. Modern agent routing is a graph of identity, credentials, policy, approvals, and transport. A migration that changes one edge can break the graph even if the resulting JSON looks cleaner.

The editorial take: Codex route repair is not just a provider bug. It is a reminder that agent config deserves the same respect as infrastructure code. A helpful migration that rewrites auth paths can break production as effectively as a bad deploy. If the repair tool cannot prove the destination route works, it should leave the working route alone and say why.

Sources: OpenClaw PR #79569, OpenClaw issue #79461, OpenClaw Codex OAuth routing docs, OpenClaw v2026.5.7 release