OpenClaw’s Claude CLI Sandbox Bug Is a Truth-in-Advertising Problem for Agent Security

OpenClaw’s Claude CLI Sandbox Bug Is a Truth-in-Advertising Problem for Agent Security

A sandbox label is not decoration. In an agent runtime, it is a security claim the user relies on while approving tools, letting a model touch a browser, or deciding whether a coding agent can operate near real credentials. That is why OpenClaw issue #84942 matters: it reports a split-brain state where OpenClaw says a Telegram session is sandboxed, while the actual Claude CLI runner says sandboxing is off.

The contradiction is sharp. openclaw sandbox explain reports runtime: sandboxed, mode: all, scope: agent, and a workspace root under /home/claw/.openclaw/sandboxes. The real Gateway turn, however, reports provider: "claude-cli", runner: "cli", and sandbox: { "mode": "off", "sandboxed": false }. No openclaw-sbx-* containers appear. The sandbox browser bridge is unavailable. The browser tool then fails with a misleading “Enable sandbox browser” error even though sandbox browser is already enabled in config.

That is not just a bad error message. It is a truth-in-advertising problem for agent security.

The control plane and runner are telling different stories

The reported environment is specific: OpenClaw 2026.5.19 (a185ca2), npm/global package, Linux VPS, systemd user Gateway, anthropic/claude-opus-4-7, and a Claude CLI OAuth profile. The config is not ambiguous. It sets agents.defaults.sandbox.mode = "all", workspaceAccess = "rw", scope = "agent", Docker image openclaw-sandbox-browser:bookworm-slim, and browser settings with noVNC and auto-start enabled.

From the operator’s perspective, the sandbox is on. From the control plane’s explanation, the sandbox is on. From the runner’s execution report, the sandbox is off. Security UX breaks exactly there: not when a setting is unavailable, but when the platform presents a boundary that execution does not enforce.

The recent Claude CLI routing work gives this issue context. In PR #84374, Anthropic model refs selected with Claude CLI auth were routed through the Claude CLI runtime so shorthand refs such as anthropic/opus-4.7 would not fall back to embedded Anthropic billing. That is a useful product fix. Users who authenticate through Claude CLI want Claude CLI to be used. But moving execution into a different runner changes the runtime contract. If sandboxing does not travel with that runner, the platform has to say so.

Sandbox posture is security-critical UI

Agent users do not approve tools while reading architecture diagrams. They approve tools through posture summaries, policy explanations, and runtime prompts. If those surfaces say “sandboxed,” people behave differently. They may permit browser access. They may allow workspace writes. They may route a task through a model they would not otherwise trust near local state. In that sense, sandbox labels are part of the security boundary even though they are UI.

There are only two acceptable fixes. The better one is runtime support: Claude CLI execution should participate in the same sandbox lifecycle, including the sandbox browser bridge, when policy says mode: all. That means containers appear, workspace mounts are what the operator expects, and browser target sandbox has a real bridge URL. The less satisfying but still correct fix is fail-closed truthfulness: if this runner cannot honor sandboxing, sandbox explain and the run report should say that before a model receives a turn.

The unacceptable state is the one described in the issue: config, docs, and explain output claim a boundary; execution bypasses it; the browser tool produces an error that points at the wrong setting. That combination teaches users to debug the wrong layer while overestimating isolation.

Practitioners should treat this as a runner-specific risk, not a generic “OpenClaw sandbox is broken” claim. The issue is about Claude CLI OAuth execution. If your deployment uses that path, verify the effective runtime, not just the desired configuration. Run a trivial turn. Check whether openclaw-sbx-* containers are created. Test browser target="sandbox". Inspect the run report for sandboxed: true. If the actual runner is not sandboxed, either adjust tool permissions or route sensitive workflows through a runner that enforces the boundary.

Model governance now includes auth mechanism and runner semantics

This is also why “Claude versus Codex versus Copilot” comparisons keep missing the operational story. A Claude session through embedded Anthropic billing and a Claude session through CLI OAuth can have different runtime properties. Same model family. Different auth path. Different runner. Different sandbox compatibility. Different audit behavior. For enterprise coding-agent governance, that is not a footnote; it is the evaluation.

Security teams should update their review checklists accordingly. Do not ask only which model is used. Ask which runner executes it, where credentials live, whether sandbox posture is enforced or merely configured, whether browser bridges are inside the sandbox, and whether the audit log records the effective posture. The effective posture is the one that should drive approvals.

OpenClaw’s labels on #84942 are appropriately serious: P1, impact:security, impact:auth-provider, clawsweeper:needs-product-decision, and clawsweeper:needs-security-review. That is the right categorization because this is partly implementation and partly product policy. Should Claude CLI be allowed when sandbox mode is required? Should it warn and continue? Should it fail closed? Should it route differently? Those are product decisions because users will build trust around the answer.

The broader lesson is blunt: sandbox claims must be data-plane claims. A config file can express intent. A CLI explain command can describe planned policy. But the run report must say what actually happened. If those diverge, the platform should prefer honesty over comfort. “This runner cannot be sandboxed” is an annoying message. “Sandboxed” while running unsandboxed is a security bug.

OpenClaw is not alone here. Every agent platform that supports multiple model providers, local CLIs, remote APIs, OAuth profiles, browser tools, and containerized execution will hit this class of mismatch. The platforms that deserve production trust will not be the ones with the prettiest sandbox diagram. They will be the ones that make the effective runtime posture observable, enforceable, and impossible to confuse with a wish.

Sources: OpenClaw issue #84942, OpenClaw v2026.5.20-beta.1, OpenClaw PR #84374, related issue #84222