openclaw

OpenClaw's New Runtime-Context Leak Is a Good Reminder That Hidden Metadata Is Only Hidden Until the Model Says It Out Loud

Anatoliy Kolodkin

01 May 2026 • 4 min read

There is a specific kind of bug that only shows up under pressure. Not a crash that leaves a stack trace. A regression that feels like a weird user experience problem until you trace it back and realize the runtime itself is saying something it should not be saying. That is what happened with OpenClaw v2026.4.25-beta.4, and the story is worth sitting with because it exposes something deeper than a bad prompt instruction.

Here is what a user running a local Qwen3.6-35B behind vLLM on Telegram saw on every single reply after upgrading to beta.4: the agent quoted back a block of text that started with "OpenClaw runtime context for the immediately preceding user message. This context is runtime-generated, not user-authored. Keep internal details private." The bot was echoing its own internal metadata into the conversation, and the human-readable instruction to keep it private was apparently doing the opposite of its job.

The mechanism is not mysterious. The beta adds a new openclaw.runtime-context custom message that gets injected into the model's conversation history before every reply turn. The message has display: false set, which correctly hides it from the UI. But hiding a message from the human in the chat window does not hide it from the model. The model still receives the full text — including the instruction to keep it private — in its context window. A model that perfectly follows that instruction would never repeat it. A model that is less rigorous about instruction-following, or that simply treats everything in context as fair game for the next output token, will say it out loud. That is exactly what Qwen3.6-35B did.

The trigger condition is transcriptPrompt !== effectivePrompt. This is true on every Telegram turn because the channel envelope data — metadata about which chat, who sent it, what thread — exists in the effective prompt but not in the saved transcript. So every inbound Telegram message causes the runtime to inject the context block, and every reply that comes back from a weaker model has a chance of quoting it. Roll back to v2026.4.24 and the leak goes away, because the mechanism did not exist in that version.

The obvious fix is better prompting. Tell the model more firmly not to echo internal text. Make the instruction more specific, more emphatic, more carefully engineered to survive token-level variation. That is the quick path, and OpenClaw will probably try it. But the real problem is architectural, and it is worth naming clearly: "hidden" context in an agent framework is not architecturally private. It is text in the context window that the model can reason over, repeat, and act on.

This is not a new lesson. It shows up in every system where developers conflate "not rendered in the UI" with "not visible to the model." It shows up in prompt injection research constantly. It shows up every time someone puts an internal instruction in a system prompt and then discovers that frontier models, let alone smaller open-weight models, will occasionally surface or act on things that were supposed to stay under the hood. The pattern is so common that it has its own research vocabulary: in-band signaling, hidden channel assumptions, and the fragile hope that a model will treat a text string as a policy boundary rather than content.

What makes this case slightly more interesting than the usual prompt injection talk-track is the specific architectural pattern. OpenClaw is adding runtime-context injection to make the agent aware of channel-level metadata it needs to route correctly, maintain session state, and handle multi-channel complexity. That is a legitimate goal. A persistent agent that handles Telegram DMs, Slack threads, and email responses needs to know which channel it is in and what the envelope metadata says. The runtime context mechanism is solving that problem. But solving it by injecting human-readable text that says "keep this private" is treating the model as a security boundary rather than a processing engine. Some models will cooperate. Many will not, especially at lower capability levels or when the text is embedded in a part of the context that the model treats as normal conversation history.

The issue was filed on April 26th, and it landed alongside two other same-day reports — #71761 and #71847 — both documenting metadata leaking into visible replies on vLLM and Nemotron paths. Different model providers, different failure modes, same root cause: the control plane is adding complexity faster than the rules about what lives in-band versus out-of-band are being updated. When the same underlying mechanism produces three separate visible bugs in one day across different model stacks, that is not bad luck. That is an architectural stress signal.

The irony is that v2026.4.25-beta.4 also delivered a real improvement. The same user reported that cold-path latency dropped from roughly 200 seconds to 71 seconds on the same local setup — a material win that the regression made irrelevant to anyone who hit it. An agent that is faster but occasionally quotes its own system instructions is not a better experience than a slower agent that keeps its internals to itself. You need both.

For builders running local or open-weight models through compatibility layers, the action item is immediate: audit what "hidden" messages your framework is putting in the context window. If it is human-readable text with instructions embedded in it, assume some model will eventually say it back. The correct architectural response is not better prompting. It is keeping machine-readable runtime metadata in a channel or header that the model cannot read as conversational content — separate from the transcript, outside the turn history, processed by the runtime before the model prompt is assembled rather than injected into it.

OpenClaw will patch this. The maintainers move fast on regressions that have clear reproduction steps and clean root causes. But the underlying assumption — that you can tell a model what not to repeat and it reliably will not — is one that the whole agent stack industry keeps making and keeps discovering is fragile. The right lesson is not "fix the prompt." It is "fix the boundary."

Sources: GitHub issue #72386, OpenClaw v2026.4.25-beta.4 release notes

Sign up for more like this.