OpenClaw’s Sentinel-Token Sanitizer PR Shows Tool Boundaries Need Input Hygiene

OpenClaw’s Sentinel-Token Sanitizer PR Shows Tool Boundaries Need Input Hygiene

Tool-call hygiene is one of those runtime chores nobody notices until the agent fabricates an answer because a stray token broke the tool that would have produced evidence.

OpenClaw PR #82797, opened on May 17, is about exactly that failure mode. The patch proposes sanitizing leaked LLM sentinel tokens such as <|...|>, <|<|", and <|" when they appear at the head of bash.exec command arguments or web-fetch URL inputs. The live report came from a local Gemma 4 26B NVFP4 deployment where malformed tool inputs hit the shell or URL parser, the tool failed, and the agent then answered from no useful evidence.

That is the important framing. This is not just “local model emits weird tokens.” It is a runtime boundary problem. Model transport artifacts crossed into tool arguments. The tools failed for reasons unrelated to user intent. Then the orchestration layer let the agent continue into an unsupported answer. The user sees hallucination; the operator sees a model issue; the runtime quietly helped both by not separating garbage cleanup from security validation.

The PR is labeled P1, agents, and proof: supplied. It changes seven files with 180 additions and two deletions, adding sanitizer paths for web-fetch URL handling and bash command handling. The reported environment is not tiny: a DGX Spark GB10 ARM64 Ubuntu 24 host with 128 GB unified memory, Gemma 4 26B NVFP4 served by local vLLM at http://127.0.0.1:8005, and real Telegram-channel user sessions under an agent named gemma. Live logs after the patch showed multiple bash.sanitize_special_tokens events on 2026-05-17 KST, with examples like original_len=61 sanitized_len=56 and original_len=121 sanitized_len=116. That roughly five-byte trim matches a leading malformed sentinel prefix.

Local models make runtime hygiene non-optional

Hosted frontier models have made many teams complacent about tool-call cleanliness. Their function-call wrappers are usually good enough that malformed leading sentinels are rare. Local and smaller models change that. They may use different chat templates, stop-token behavior, quantization paths, or serving wrappers. They may leak special tokens into places the runtime assumed were already clean.

When the bad token lands in a natural-language response, it is ugly. When it lands at the beginning of a shell command or URL, it becomes operational. A valid file inspection command can turn into a shell failure. A valid URL can fail parsing before the fetch layer ever reaches useful policy checks. The agent then has to decide what to do with missing evidence. Some retry. Some ask for help. Some invent. The last one is where tool hygiene becomes trust hygiene.

The right answer is narrow, boundary-specific sanitization with telemetry. At the bash.exec tool boundary, a leading model sentinel artifact is almost certainly not user intent. Stripping a tightly defined malformed prefix before execution can be reasonable if the runtime logs it and leaves normal input untouched. That turns a model integration wart into a recoverable event rather than a fabricated response downstream.

Telemetry is not optional. If a model frequently needs sanitizer intervention, the operator should know. Count events by model, agent, channel, tool, prefix type, and outcome. A spike may indicate a bad chat template, wrong stop sequence, model regression, quantization issue, or prompt format mismatch. Sanitization should buy reliability, not hide integration debt.

Security guards should not become cleanup crews

The review feedback on PR #82797 is the most important part of the story. ClawSweeper accepted the bug shape for current main: bash exec commands pass through to execution, and web-fetch URLs are validated with new URL, so leading sentinel text can produce shell or URL failure. But it blocked merge pending stronger proof for the web-fetch URL path and flagged a P1 security concern: keep URL sanitizing out of the shared SSRF fetch guard.

That is exactly the right line. A shared SSRF guard should validate, not reinterpret. If invalid input reaches a low-level network security primitive, rejection is safer than cleanup. Sanitizing inside the guard risks teaching the policy layer to transform hostile-looking input into something fetchable. That kind of convenience has a way of becoming policy drift.

Sanitize near the model-facing tool boundary. Validate strictly near the network boundary. Those are different jobs. The web_fetch command can decide whether a leading sentinel prefix is a recoverable model artifact and log the event. The shared fetch guard should remain boring, strict, and predictable. In agent security, boring is a feature.

For practitioners running local models with tools, the action list is straightforward. Instrument malformed tool-call rates. Track invalid JSON arguments, URL parse failures, shell syntax failures, sanitizer trims, and post-failure hallucination indicators. Add tests that inject sentinel prefixes into tool arguments and verify the agent either sanitizes narrowly or fails loudly. Review prompt templates and stop tokens before blaming the model. And do not let a cleanup patch weaken SSRF, sandbox, or command-approval policy.

The editorial take: input hygiene belongs at tool boundaries, but security guards are not janitors. Clean up model garbage close to the model. Enforce policy close to the resource. Mix those layers and you get a system that is both more permissive and harder to reason about — the worst possible trade.

Sources: OpenClaw PR #82797, OpenClaw issue #82812, OpenClaw v2026.5.16-beta.3 release, OpenClaw PR #82777