OpenClaw’s Telegram Stall Bug Shows What Happens When Long-Lived Agent Sessions Outgrow Their Abstractions

OpenClaw’s Telegram Stall Bug Shows What Happens When Long-Lived Agent Sessions Outgrow Their Abstractions

Persistent agent sessions sound elegant right up until they start aging like milk. That is the lesson hiding inside a new OpenClaw Telegram bug report, which describes a bot that gradually becomes unresponsive after long-running use, then eventually needs manual surgery to recover. On the surface this looks like a Telegram problem. Underneath, it looks much more like a platform lifecycle problem: context growth, failed compaction, stuck child processes, polling instability, and transport errors all stacking until the illusion of one coherent assistant falls apart.

The issue, #68494, is strong because it ties together several layers of evidence instead of blaming generic network flakiness. The shared session behind the Telegram bot, agent:main:main, reportedly grew to roughly 295 to 325 messages before recovery. Logs show an explicit context-overflow diagnostic at 09:07 UTC, followed by automatic compaction attempts. At the same time, the runtime kept injecting a large workspace bootstrap file, with repeated warnings that MEMORY.md was 34,488 characters and had to be truncated to a 12,000-character limit. Then came the channel symptoms: a 95.72-second polling stall, forced restart cycles, and finally repeated sendMessage and editMessageText failures that left the bot effectively silent.

That sequence matters because it tells us the failure was cumulative, not random. The platform did not just lose a network call. It slowly crossed several local thresholds at once. Context got too large. The compaction story appears to have been incomplete. The transport layer degraded. Long-running child processes accumulated under the gateway. By the time the operator intervened, systemctl --user status openclaw-gateway.service showed memory around 2.1 GB and task count around 138. Recovery required killing stuck browser and capture processes, detaching or resetting the bloated session, and restarting the gateway service.

The most interesting clue is the linked discussion around a separate compaction issue, #68329, and the fix in PR #68388. The core complaint there is that CLI-backed sessions can reload pre-compaction history on the next turn, effectively undoing OpenClaw’s effort to shrink the transcript. If that diagnosis holds, it means the platform may have been compacting one view of the session while the underlying runtime kept living in another. That is the kind of abstraction leak that turns “persistent session” from a convenience into an operational hazard.

This is the part the agent market still undersells. Long-lived assistants are not just prompts with memory. They are stateful systems with aging behavior. Once you promise users that a bot will stay around, preserve context, and keep responding across days, you inherit a set of responsibilities that look suspiciously like the responsibilities of any other always-on service. You need rotation strategy. You need reliable compaction. You need cleanup for child processes that outlive the work they were spawned for. You need transport recovery that does not depend on an operator noticing the bot has gone quiet.

Telegram happens to be where this bug showed up, but the lesson travels. OpenClaw is a multi-channel runtime that routes requests through external CLIs, manages local state, supervises tools, and mediates network calls. A failure in any one of those layers can be survivable. A mismatch between their assumptions is what gets dangerous. If the compaction subsystem thinks a transcript is smaller than the provider runtime does, and the channel layer keeps reusing the same session anyway, and the gateway keeps hauling around leftover child processes, eventually the whole stack begins lying about its own health.

There is also a quieter product smell here: repeated workspace bootstrap. If the platform is re-injecting large bootstrap context over and over into the same long-lived session, it is paying a token tax to remind itself of facts that should probably be handled more structurally. That does not just increase cost. It accelerates session aging. Many agent platforms still use ever-larger prompt scaffolding to paper over missing state architecture. It works, until it doesn’t, and when it stops working it usually looks exactly like this: a bot that seems fine for a while, then becomes haunted.

For operators, the practical advice is blunt. Do not assume a shared main session can grow forever just because the UI makes persistence feel natural. Test session rotation under real channel load. Verify that compaction updates the durable state of the runtime you are actually using, not just the platform’s local metadata. Watch task count and child-process accumulation alongside context size. And if your assistant relies on large workspace bootstrap files, ask whether those belong in every turn or in a more selective retrieval path.

For platform builders, this bug is a reminder that persistent sessions need an aging strategy, not just a storage strategy. Persistence is not the same as immortality. Good runtimes decide when to fork, compact, summarize, archive, or rotate before users are forced into emergency cleanup. The alternative is what OpenClaw operators saw here: long-lived state slowly turning into long-lived fragility.

My take is that this is not mainly a Telegram story. It is a case study in what happens when agent platforms outgrow the abstractions that made them charming at small scale. “One assistant, one ongoing conversation” is an appealing mental model. But once the system includes compaction, external CLIs, local browser workers, and channel transports, that mental model needs real lifecycle management behind it. Otherwise persistence stops being a feature and starts being accumulated risk.

Sources: OpenClaw issue #68494, OpenClaw issue #68329, OpenClaw PR #68388, OpenClaw v2026.4.15 release notes