Discord Needs the Same Ingress Isolation Telegram Just Got

Discord Needs the Same Ingress Isolation Telegram Just Got

A chat bot that takes four seconds to notice a message does not feel “under load.” It feels absent. That is the product problem behind OpenClaw issue #83591, which argues that Discord needs the same ingress isolation Telegram recently received. The technical complaint is about event loops, worker threads, WebSockets, and sidecar load. The user complaint is simpler: I sent the agent a message and, for several seconds, there was no evidence it had seen me.

The report comes from a real deployment, not a synthetic benchmark: OpenClaw 2026.5.12 (f066dd2), Node 24.13.0, Debian 12 on Linux 6.1, Hetzner Cloud, systemd --user, Discord as the primary channel, and roughly 30 active crons mostly backed by zai/glm-4.7. Production liveness logs show the main process under enough pressure to make inbound pickup visibly unreliable. At 10:43 UTC, OpenClaw reported eventLoopDelayP99Ms=89.8 and eventLoopDelayMaxMs=2225.1, with an active cron model call plus active and queued Discord channel work aged 11 seconds. A follow-up warning at 11:21 UTC showed eventLoopDelayMaxMs=1149.2 and queued Discord age of 4 seconds. Recent phases included sidecars.model-prewarm:3498ms and post-attach.update-sentinel:1853ms. Translation: runtime maintenance work and channel pickup are fighting in the same room.

That fight is not just ugly internally. It changes how humans interpret the agent. If Discord shows no typing indicator and no acknowledgement for three or four seconds, users do not think “the main event loop is saturated by cron and sidecar work.” They think the bot missed the message, crashed, or is generally flaky. For a personal assistant or team workflow agent, perceived pickup latency is part of trust. Model quality starts after the message is safely received. Presence starts before that.

Telegram already proved the architecture

The reason #83591 is compelling is that OpenClaw has already shipped the pattern for another channel. PR #81746 moved Telegram Bot API polling into an isolated worker thread and durably spooled fetched updates before advancing the offset. That last part is the key. In polling systems, advancing the offset is an acknowledgement boundary. If the gateway advances it before the update is safely stored, a transient main-loop stall can become message loss. Telegram’s new worker can fetch and spool updates even while the main thread is blocked.

The proof from that PR is unusually concrete: during a deliberate five-second main-thread block, the worker still fetched and spooled an update, with output including {"ok":true,"blockMs":5000,"hitCount":64228,"firstHitOffsetMs":119,"spooledUpdateIds":[81132001]}. That is the right kind of reliability test. It does not claim the model got faster. It proves inbound pickup survived a main-thread stall. For chat agents, that distinction is everything.

Discord is not Telegram with different JSON. Telegram long-polling gives you discrete HTTP fetches and explicit offsets. Discord is a persistent WebSocket gateway. A commenter on #83591 confirms the code-shape gap: Telegram now has telegram-ingress-worker.ts, telegram-ingress-spool.ts, and telegram-ingress-worker.runtime.ts, while Discord still creates ws.WebSocket on the main thread in extensions/discord/src/internal/gateway.ts:173 and dispatches inbound events on that same loop. So no, this is not a copy-paste port. It is the same architecture principle expressed through a different transport.

Separate receipt from thinking

The design target should be straightforward: own the Discord socket somewhere main-loop sidecar spikes cannot starve it, then forward events into the core process through a bounded queue with honest backpressure. That may mean a worker-owned WebSocket that posts events back to the gateway. It may mean a smaller isolation boundary around inbound dispatch rather than the socket itself. The implementation details matter, but the product invariant matters more: receiving a user message should not be hostage to model prewarm, cron work, transcript maintenance, or a slow channel task already in progress.

This is the same lesson every serious agent platform is learning. “Async” is not a vibe. If all inbound messages, cron jobs, model calls, sidecar setup, plugin loading, and channel dispatch share one overloaded event loop, the system behaves like a single-lane bridge with better branding. The model may be remote. The tool calls may be concurrent. The UI may show multiple sessions. But the user’s first touchpoint is still stuck behind whichever maintenance task currently owns the loop.

For practitioners running OpenClaw or similar agent systems, the action list is practical. Watch event-loop delay, not just model latency. Track time-to-first-acknowledgement separately from time-to-final-answer. Instrument channel pickup age per adapter. If you operate Discord, Slack, Telegram, WhatsApp, Signal, or Google Chat agents, test what happens when cron jobs are active, sidecars are warming, plugins are loading, and the main process is briefly blocked. A bot that works in a quiet dev session can feel broken in the exact environment where it is supposed to be useful.

There is also a subtle reliability distinction worth keeping: users can tolerate “the agent is thinking” better than “the agent may not have seen me.” The first is latency with a known state. The second is uncertainty. A typing indicator, queued-state acknowledgement, or durable receipt marker changes the social contract. It tells the human the system has the message and is processing it. Without that, every second of silence makes the agent feel less dependable, even if the final answer is good.

OpenClaw’s issue labels capture the shape: P2, impact:message-loss, and a relatively small visible reaction count. That is about right for a plumbing issue that will not trend but will absolutely shape user experience. The community follow-up is more useful than social noise because it identifies the actual boundary: Telegram’s worker/spool path exists; Discord’s gateway remains main-thread-owned; WebSocket isolation is harder but necessary if Discord is going to behave like a present assistant under load.

The bigger editorial point is that agent platforms are becoming distributed systems with chat bubbles. Once an assistant has crons, sidecars, model prewarm, channel adapters, durable queues, and background work, the old chatbot mental model breaks. The runtime has to decide which work is allowed to delay which other work. Inbound user messages should be near the top of that priority list. Otherwise the agent is not “autonomous”; it is just busy in ways the user cannot see.

My take: Discord ingress isolation sounds like boring connector plumbing until you remember the product promise. The agent is supposed to be present. Presence starts at message pickup, not model output. Telegram now has a better answer inside OpenClaw. Discord should get one before users learn to interpret silence as failure.

Sources: OpenClaw issue #83591, OpenClaw PR #81746, OpenClaw PR #83575, OpenClaw issue #81132