Codex-in-Docker Still Trips Over bwrap Because Sandbox Layers Need a Contract

Codex-in-Docker Still Trips Over bwrap Because Sandbox Layers Need a Contract

Codex versus Claude is usually framed as a model-quality argument: who writes better code, who follows instructions better, who burns fewer tokens getting there. That framing is too narrow. The more operational question is whether the agent can reliably reach a shell, run the command it planned, and report the result without the runtime tripping over its own safety machinery. OpenClaw issue #83599 is a useful reminder that the “best coding agent” decision is partly a Linux namespaces decision wearing a product comparison hoodie.

The report describes Codex-backed OpenClaw sessions failing before command execution with a low-level Bubblewrap error: bwrap: loopback: Failed RTM_NEWADDR: Operation not permitted. The environment is not exotic in the way maintainers can safely ignore: OpenClaw 2026.5.12 (f066dd2), Ubuntu 24.04 on Linux 6.8, Node 22.22.0, OpenAI’s Codex app-server, and OpenClaw session execution set to docker/all. The session’s Codex config was also explicit: mode: "yolo", approvalPolicy: "never", sandbox: "danger-full-access", and approvalsReviewer: "user". In other words, the operator thought they had already said: OpenClaw is providing the outer execution boundary; do not add another one.

That is not what happened. The issue points to OpenClaw logic that appears to narrow Codex’s requested danger-full-access back to workspace-write when OpenClaw’s own sandbox is enabled: if (sandbox?.enabled && appServer.sandbox === "danger-full-access") { appServer.sandbox = "workspace-write" }. Once Codex sees workspace-write, its native sandbox path can invoke bundled bwrap. On this host, direct probes against Codex’s Bubblewrap also fail, including bwrap: setting up uid map: Permission denied and the same loopback error under --unshare-net. The result is brutally simple: the agent cannot run pwd.

The outer sandbox and the inner sandbox need a contract

This is not a “just enable unprivileged user namespaces” footnote. Sometimes that will be the right host-level remediation; sometimes it will be forbidden by security policy; sometimes the workload is already inside a container specifically so the host does not need to expose more namespace flexibility. The product problem is that two sandbox authorities are making independent assumptions. OpenClaw wants the platform sandbox to be the boundary. Codex wants its own execution mode to preserve workspace-write semantics. Both instincts are defensible. Combined without a clear contract, they create a nested-sandbox dependency that fails on locked-down systems.

That distinction matters because agent platforms are increasingly sold as portability layers. Operators expect to swap Claude CLI, Codex app-server, ACP backends, local models, and remote providers without relearning every backend’s kernel assumptions. Reality is messier. A coding agent is not just a model endpoint; it is an execution harness, a filesystem policy, an approval model, a process launcher, and sometimes a sandbox implementation with opinions about Linux capabilities. If those layers disagree, the failure shows up as a model that “doesn’t work,” even when the model never had a chance to respond.

The related context makes the issue sharper. OpenClaw recently addressed Codex app-server sandbox egress in #83347, and v2026.5.16-beta.7 includes another Codex guardrail: explicitly requested Codex harnesses now fail closed if unregistered instead of silently falling back, tied to #83349. Those are good platform instincts. But #83599 shows the remaining gap: fail-closed routing and egress fixes do not answer who owns sandbox semantics when OpenClaw and Codex are both trying to be careful.

Practitioners should test the boring path first

The immediate advice for teams evaluating Codex inside OpenClaw Docker sandboxing is not glamorous: before moving a real repo, start a session and run trivial commands. Run pwd. Run ls. Run a harmless test command that exercises the exact session path your automation will use. Then record whether the host supports unprivileged user namespaces, whether Codex’s bundled bwrap can create the namespaces it wants, and whether OpenClaw’s configured sandbox mode is being translated before it reaches Codex.

That sounds too basic until you remember what these systems are used for. A background coding agent may be asked to triage a bug, edit files, run tests, open a pull request, and summarize the diff. Every step depends on shell availability. If shell startup fails before the first command, the rest of the evaluation is theatre. Teams comparing Codex, Claude Code, Cursor, OpenCode, or any ACP-backed harness should include runtime startup semantics in the scorecard, not just benchmark tasks and subjective code quality.

There is also a governance lesson here. Security teams often prefer layered controls, and usually they are right. Defense in depth beats one heroic boundary. But agent sandboxes are stateful execution systems, not static firewall rules. Layering two sandbox implementations can produce a smaller attack surface, or it can produce undefined behavior, duplicated policy, misleading configuration, and hard-to-debug failures. The fix is not necessarily “remove one sandbox.” The fix is to make authority explicit: either OpenClaw declares the outer Docker sandbox authoritative and passes Codex a policy that avoids native Bubblewrap, or OpenClaw documents that Codex app-server requires host namespace support even when running inside OpenClaw’s sandbox.

Maintainers also need to be careful with the word “danger.” A setting called danger-full-access looks scary, and narrowing it under an outer sandbox may seem prudent. But names are context-dependent. Inside a platform-managed Docker execution boundary, “full access” may mean “full access within the already-contained workspace,” not “please escape the runtime and touch the host.” If the platform silently remaps it to workspace-write, it may accidentally invoke a backend sandbox that is less compatible with the chosen deployment model. Good defaults need to preserve the operator’s security intent, not just the most conservative-looking string.

My take: this is exactly the class of bug that separates agent demos from agent infrastructure. Model selection gets the conference talks; execution contracts get the pager. The Codex-vs-Claude decision is not only about reasoning quality or price. It is about whether the harness, sandbox, host, and orchestrator agree on who is allowed to do what. If they do not, your agent stack has already failed before the model gets to be clever.

Sources: OpenClaw issue #83599, OpenClaw issue #83347, OpenClaw v2026.5.16-beta.7, OpenClaw issue #83349