OpenClaw’s Telegram Beta Blocker Shows Why Policy Keys Must Never Become Memory Keys

OpenClaw’s Telegram Beta Blocker Shows Why Policy Keys Must Never Become Memory Keys

OpenClaw’s Telegram duplicate-reply bug is easy to misread as channel weirdness. It is not. It is the more dangerous class of agent-runtime bug: two subsystems disagreed about who the user was, and the runtime briefly believed both of them.

The new beta-blocker issue, #84936, reports that direct Telegram messages in OpenClaw 2026.5.19 could derive a per-peer runtimePolicySessionKey and then accidentally use that same policy key as the Lossless/LCM context-engine sessionKey. That distinction sounds fussy until you see the production symptom: one inbound Telegram message assembled stale per-DM history, hit a context-window failure, sent a generic fallback error, then assembled the canonical agent:main:main history and sent a second normal reply.

One message in. Two visible sends out. One user wondering whether the assistant had lost its mind.

The bug is identity confusion, not Telegram flakiness

Agent systems now carry more identity surfaces than most web apps did a decade ago. There is the user-facing conversation identity, the policy identity, the channel route, the sandbox scope, the tool permission context, the memory key, the provider lane, and often a durable transcript identity beneath all of it. They may be represented as strings. They should not be treated as interchangeable strings.

In the reported case, the expected context identity was sessionKey=agent:main:main. Instead, the runtime policy resolver derived a per-peer key shaped like agent:main:telegram:default:direct:<peer-id>. That key is useful for policy. It can let Telegram DMs receive different tool allowlists, sandbox settings, provider routing, or channel-specific behavior. But it is the wrong key for memory continuity if the user expects the main agent lane.

The logs make the failure concrete. The stale per-DM LCM conversation 1876 assembled 416 context items, 17 summary items, 124 input messages, and 417 output messages. OpenClaw estimated 227,240 tokens and projected an 817,231-character prompt. The model failed with the blunt runtime error: Codex ran out of room in the model's context window. Start a new thread or clear earlier history before retrying.

Then the canonical lane assembled. Conversation 1872 used 165 context items, 2 summary items, 125 input messages, and 137 output messages. Estimated tokens dropped to 61,135 and projected prompt size to 166,128 characters. That second assembly produced the normal reply. The failure was not that Telegram duplicated an event. It was that the runtime briefly gave the context engine a stale identity and paid the price in token burn, latency, and user-visible confusion.

Policy keys are not memory keys

The same-day fix PR, #84954, is small in footprint and large in meaning: 2 files, 47 additions, 13 deletions, and a regression proving context-engine history remains bound to the run session when the sandbox key differs. The fix separates contextSessionKey from runtimePolicySessionKey. That is exactly the boundary agent runtimes need to draw.

Policy can and should vary by channel, peer, route, and sandbox. Memory should follow the product contract the user experiences. If a direct message is supposed to be a view into the main assistant, the context engine must assemble the main assistant’s memory. If a direct message is supposed to be isolated, that should be explicit configuration, not an accidental consequence of whatever key was most convenient in the resolver stack.

This matters beyond OpenClaw because every serious agent platform is drifting toward the same architecture. Tools need scoping. Sandboxes need scoping. Message channels need scoping. Enterprise deployments need audit trails per user or team. Memory systems need durable continuity. Once those keys coexist, the platform either makes the boundaries typed and testable or it eventually ships a bug where policy identity starts selecting memory.

The fix scope listed in the PR is the right checklist: Codex context-engine calls, workspace bootstrap context, prompt reporting, compaction and maintenance, transcript mirroring, and finalization now use the canonical context session key. Sandbox resolution, runtime policy scoping, Telegram allowlisting, model selection, and provider/network behavior remain unchanged. In other words: do not flatten the system; route each subsystem the identity it actually owns.

The operational cost is bigger than one duplicate reply

The visible duplicate reply is annoying. The invisible cost is more important. A stale memory lane pushed the prompt near the full GPT-5.5 context window before failing. On a smaller model, the same class of bug would present as sudden amnesia or missing context. On a larger model, it becomes expensive before it becomes obvious. Either way, the operator gets a runtime that is less predictable precisely where predictability matters most: context selection.

For teams running OpenClaw DMs against a shared main session, the practical move is straightforward: upgrade when the fix lands in the release train you use, and audit stale per-DM LCM rows if you previously changed session scoping. If you have user complaints about duplicate replies, context-window errors, or Telegram DMs that appear to remember a different history than the main lane, this issue is a good place to start.

For builders of agent infrastructure, add tests that make identity boundaries explicit. A context-engine call should receive a canonical conversation key. A sandbox resolver should receive a sandbox/policy key. A tool-policy check should receive the policy identity. A transcript mirror should know which durable lane it mirrors. Then add one regression that intentionally gives policy and context different keys and proves they stay different. That test will look boring until it saves you from shipping exactly this bug.

There is also a design smell worth naming: stringly-typed session keys are carrying too much semantic weight. If a key can mean “this user’s Telegram DM policy lane” in one file and “this conversation’s memory lane” in another, the compiler cannot help you. At minimum, wrappers or explicit parameter names should make misuse harder. Better yet, the runtime should model these identities as different concepts and force the call sites to choose.

The repo-native response was good. The issue was labeled beta-blocker, P1, impact:session-state, and impact:message-loss. ClawSweeper confirmed the source path and kept it open. PR #84954 landed roughly 85 minutes after the report, with live Telegram proof and 21 passing Vitest tests across runtime-policy and Codex context-engine shards. That is maintenance velocity worth respecting.

But the lesson should outlive the patch. The headline is not “OpenClaw fixed a Telegram bug.” The headline is that agent runtimes need typed identities for memory, policy, sandbox, and channel routing. Memory is the user’s continuity. Policy is the runtime’s constraint system. If those collapse into the same key, the assistant starts remembering through the wrong security boundary. That is not a channel bug. That is an architecture bug wearing a chat bubble.

Sources: OpenClaw issue #84936, OpenClaw PR #84954, OpenClaw v2026.5.20-beta.1, related issue #84885