openclaw

OpenClaw’s Codex Runtime Placement Fix Is the Sandbox Story Coding-Agent Comparisons Usually Miss

Anatoliy Kolodkin

19 May 2026 • 4 min read

The least useful way to compare coding agents is to ask which model writes better code in a benchmark. The more useful question is uglier: where does the code actually execute when the agent decides to run a shell command? PR #84377 matters because it sits exactly at that boundary. It tries to stop implicit OpenAI-to-Codex routing from changing OpenClaw’s requested execution placement underneath the operator.

The pull request addresses a runtime-boundary problem with direct security implications. If a session requested OpenClaw-managed execution on a node or inside a sandbox, implicit Codex selection should not quietly move shell or code execution into a different surface. The patch keeps implicit OpenAI/Codex routes on Pi when effective exec placement is node or sandbox, while preserving explicit agentRuntime.id = "codex" pins. That distinction is the whole story: explicit operator intent is one thing; a provider alias silently changing execution semantics is another.

The PR was opened on May 20 at 00:55 UTC and was still waiting on stronger proof during the research window. It carried P1 labels and a gold-shrimp rating, but also status: needs proof. That is the right posture. The source-level tests and resolver proof are promising; the final bar should include a live patched Gateway session proving a command lands where the policy says it should land. Runtime placement is not the kind of claim you validate with vibes.

The sandbox bug buyers never see in demos

The related issue #83796 describes the dangerous version of the bug class. A Codex runtime plus OpenClaw Docker sandbox could allow Codex-native shell or code execution to run in the gateway container rather than the per-agent sandbox. The repro used a canary command — hostname; pwd; id; printf codex-native > /tmp/openclaw-codex-native-canary — and then checked where the file appeared. The reported result was bad: Codex-native canary in the gateway container, absent from the sandbox; Pi-runtime canary in the sandbox, absent from the gateway.

That is not merely a configuration papercut. It changes the blast radius of every tool call. A user can believe they configured a contained agent while the runtime’s implicit harness selection routes execution through a different container or host boundary. The model did not escape. The orchestration layer changed the floor underneath it.

This is exactly the kind of issue missing from “Claude Code vs Codex vs Cursor” comparison posts. Most comparisons focus on reasoning quality, IDE integration, latency, price, and whether the agent can edit a repo without making a mess. Those are useful criteria. They are not sufficient. A coding agent that writes slightly better code but executes in the wrong trust boundary is a worse production tool than a less clever agent with a predictable runtime contract.

PR #84377’s design is deliberately narrow. It does not claim to move Codex app-server execution into OpenClaw’s sandbox or node environment. It prevents implicit Codex runtime selection from overriding OpenClaw-managed exec placement. If an operator explicitly pins Codex as the runtime, the system preserves that choice. If the system implicitly routes an OpenAI or openai-codex/* model to Codex while the effective execution placement is node or sandbox, the patch keeps the route on Pi instead.

The proof supplied by the author used resolveAgentHarnessPolicy against patched source with no test mocks. The output showed openai/gpt-5.5 with global tools.exec.host=node resolving to runtime pi, openai-codex/gpt-5.5 with per-run execHost=node resolving to Pi, and openai/gpt-5.5 with per-run execHost=sandbox also resolving to Pi. Without managed exec placement, openai/gpt-5.5 still resolved to Codex implicitly. That is the right compatibility shape: preserve the existing convenience only when it does not contradict an execution-placement policy.

Runtime selection should be audit metadata, not archaeology

The patch has to propagate session and config exec placement through a lot of places: main agent runs, status metadata, plugin loading, compaction, model fallback, side-question flows, cron isolated runs, and gateway session rows. That breadth is the warning. Runtime placement is not a local flag. Once an agent platform supports cron, subagents, plugins, fallback models, compaction, and multiple harnesses, “where does this run?” becomes distributed state.

Operators should not need to reconstruct that state after an incident by reading source, session rows, and provider-specific logs. The platform should put runtime selection, runtime source, exec host, sandbox policy, workspace access, and provider alias resolution into user-visible session metadata. If a canary lands in the gateway container, the operator should be able to answer why in one screen.

The related issue #83737 sharpens the same point from the filesystem side. A regression involving sandbox.mode: "all", workspaceAccess: "ro", and Docker :rw bind mounts failed because declared writable bind targets were not included in writableRoots. That is another policy-contract failure: the config says one thing, the effective sandbox allows another, and users discover the mismatch only when a tool hits operation not permitted or writes somewhere unexpected.

For teams running coding agents, the practical checklist is simple. Audit model aliases and runtime pins. If openai-codex/gpt-5.5 or an OpenAI model can implicitly select Codex, verify whether your exec placement still means what you think it means. Run canary writes from each runtime: gateway container, node host, sandbox container, mounted writable path, and expected denied path. Test read-only workspace settings and declared writable binds as behavior, not documentation. Log the final runtime decision somewhere humans can see it.

The broader lesson is that provider runtimes are not transparent implementation details. Codex may be the right runtime for many workloads. Pi may be the right runtime for others. The security bug appears when a friendly alias or fallback path changes that decision without operator intent. Coding-agent buyers should ask where shell execution actually happens, what identity owns it, what filesystem it sees, and whether the answer changes when a model alias flips. If the answer is “we think the sandbox still applies,” that is not governance. That is hope with a config file.

Sources: OpenClaw PR #84377, OpenClaw issue #83796, OpenClaw issue #83737, OpenClaw PR #84367

The sandbox bug buyers never see in demos

Runtime selection should be audit metadata, not archaeology

Sign up for more like this.