OpenClaw's Codex Payload Patch Is a Reminder That Agent Adapters Must Not Send Phantom Capabilities

OpenClaw's Codex Payload Patch Is a Reminder That Agent Adapters Must Not Send Phantom Capabilities

The most revealing agent-runtime bugs are often the ones that fit in a tiny diff. OpenClaw PR #88175 moves two Codex Responses fields inside an if statement. That is the whole source change: do not send tool_choice and parallel_tool_calls when the request has no tools.

Small patch. Correct patch. And a useful reminder that provider adapters need to be brutally literal. They should describe the capabilities of this turn, not the capabilities the runtime commonly supports, not the capabilities a neighboring transport path supports, and definitely not the capabilities the developer assumed would be present.

Before the patch, OpenClaw’s legacy openai-codex-responses transport unconditionally set tool_choice: "auto" and parallel_tool_calls: true in the base request body. If the turn had tools, that shape was reasonable: convert the OpenClaw tools into Responses-compatible tool definitions, let the model choose whether to call them, and permit parallel tool calls. But on a no-tool turn, those fields become phantom capabilities. The payload says tool control is available while providing no tools to control.

After the patch, the request body is cleaner. tools, tool_choice, and parallel_tool_calls are all added only when context.tools exists and has length greater than zero. The regression tests assert both sides of the contract: no-tool requests must omit all three fields, and tool-enabled requests must keep one converted tool plus tool_choice: "auto" and parallel_tool_calls: true.

Schema hygiene is runtime reliability

It is tempting to treat this as provider-schema trivia. That is a mistake. Agent systems are mostly contracts: between the planner and the tool layer, between the orchestrator and the model provider, between replay and state, between retries and side effects. If one side of the system lies, even in a small way, debugging gets expensive fast.

A strict backend can reject a no-tool payload that includes tool-control fields. A permissive backend might accept it and do something reasonable. A future backend could change validation behavior. A compatibility shim might treat the inconsistency differently from the native provider path. That variability is exactly the problem. Operators see “no tools were available,” but the provider saw tool-control hints. The failure mode becomes dependent on backend strictness, transport routing, model version, or a minor API rollout. That is how small adapter mistakes turn into intermittent runtime incidents.

PR #88175 changes two files with 62 additions and 2 deletions: the provider implementation and its test file. It also cites the same class of bug in the Hermes agent ecosystem through NousResearch’s hermes-agent PR #32777 and ryanleeai/hermes-agent PR #16. That cross-project echo matters. This is not just one OpenClaw slip. It is an adapter-pattern trap: once a runtime usually runs with tools, tool-only controls tend to leak into request shapes where the tool list has been removed.

The bug class is especially relevant for coding-agent comparisons. Teams evaluating Claude Code, Codex, Copilot CLI, Cursor, OpenClaw, and local agents usually focus on model quality, task success, and editor ergonomics. They should also inspect adapter correctness. Does the stack send accurate tool schemas? Does it test zero-tool, one-tool, and many-tool turns? Does replay preserve the original provider contract? Does streaming use the same validation path as non-streaming? Does a deprecated transport still have contract tests, or is it just hanging around until a user falls through it?

Phantom capabilities are worse than missing capabilities

A missing capability usually fails loudly. The model asks for a tool that is not available, or the runtime refuses the call. A phantom capability is worse because the system advertises something ambiguous. The provider sees knobs for tools. The model may shape its answer around a tool-capable environment. The runtime may believe this is a plain no-tool turn. Everyone is slightly wrong, and only the person debugging the failure has to reconcile it.

That matters more as agent runtimes add multiple provider paths. OpenClaw’s PR notes that the provider-owned native path already avoids the main no-tools shape, but the legacy built-in Codex Responses transport still set the tool controls unconditionally. Mature systems accumulate these edge paths. The new path is clean, the old path remains for compatibility, and eventually an unusual model, strict API response, or fallback route exposes the stale assumption. Compatibility code is still production code if users can hit it.

There is a cost-governance angle too. Tool availability affects more than function calling. It changes prompt shape, reasoning behavior, provider routing, retry semantics, and sometimes billing. A no-tool turn should be as small and direct as possible. Sending unused tool controls adds ambiguity without adding capability. At scale, “harmless” payload bloat becomes harder incident response: more fields to inspect, more conditional branches to reason about, and more ways to misattribute failure to the model instead of the adapter.

For builders, the practical checklist is straightforward. Capture the exact outbound provider payload before it leaves your process. Add contract tests for no tools, one tool, many tools, empty schemas, strict schemas, streaming, retries, and replay. Treat “the provider accepted it” as a weak signal, not a proof of correctness. Accepted-but-wrong payloads are still bugs because they encode false assumptions into the system boundary.

Also, do not retire tests before retiring code. If a legacy transport exists because some users still rely on it, it deserves the same boundary tests as the preferred path. The riskiest code in agent platforms is often not the shiny new provider integration. It is the compatibility lane everyone assumes nobody uses until failover routes traffic through it at 2 a.m.

The editorial take is simple: agent adapters must be literalists. If there are no tools, send no tool controls. If parallel calls are unsupported, do not hint that they are. If a model’s schema differs from yesterday’s model, encode that difference instead of hoping the provider is forgiving. Phantom capabilities are how tiny schema mistakes become runtime failures.

Sources: GitHub PR #88175, Hermes related PR #32777, Hermes related PR #16