Telegram Topics Are Stress-Testing OpenClaw’s Multi-Agent Orchestration Model

Telegram forum topics look like separate conversations to humans and like awkwardly nested metadata to software. That mismatch is exactly where agent runtimes get hurt. OpenClaw PR #83829 fixes forum-topic routing and parallelism across topic identity, text/media buffering, media-group scoping, and outbound fairness. PR #83827 fixes the neighboring queued-followup abort bug. Together, they say something larger than “Telegram adapter improved”: multi-agent orchestration is lane discipline.

The trap is that “parallel agents” sounds like a scheduler problem. Spawn workers, collect results, ship replies. In real chat systems, the hard part is preserving who owns which work. A Telegram supergroup can contain multiple forum topics. Users experience those topics as separate threads. Telegram’s Bot API may omit fields you wish were always present. Rate limits are shared. Media groups have their own grouping behavior. Preview edits may lack the same thread identifiers as ordinary sends. If the runtime keys too broadly, independent topics serialize or leak state. If it keys too narrowly, shared transport limits punish everyone. If cancellation follows the wrong source signal, accepted work can die after it should have become durable.

The fix is not one queue; it is the right queues

PR #83829 fixes a specific identity failure: Bot API messages can include is_topic_message: true while omitting chat.is_forum. The old behavior could collapse real topic messages into the base group route. That is not a cosmetic routing bug. If topic identity collapses, the platform can misattribute conversation state, serialize unrelated work, or deliver responses into the wrong lane.

The patch replaces process-wide Telegram text-fragment and media-group promise chains with per-topic/per-buffer queues so unrelated topics can flush concurrently. It scopes media-group buffering by chat + thread + media_group_id, which prevents identical album IDs from different topics from sharing one buffer lane. It also adds a per-group fair outbound queue before grammY’s throttler, fair-sharing by message_thread_id where present and by message_id for preview edits where Telegram omits the thread ID.

The live proof is the important part. After the patch, topic routes were observed for telegram:group:<redacted>:topic:8428 and topic:5907. The run reported message.queued=4, queue.lane.dequeue=8, max waitMs=0, heartbeats max active=2, waiting=0, queued=0, message.delivery.started=3 and completed=3, Telegram sendMessage ok=21, Telegram failures 0, and 429/retry_after 0. Before the patch, applying only the new handler e2e tests to unpatched main failed with two test files and two tests failing, including the second topic not starting while the first topic was held.

The validation surface is not tiny either: git diff --check, node scripts/run-vitest.mjs extensions/telegram/src with 117 test files and 1794 tests passed, two Telegram e2e files with 12 tests passed, and extension typechecks passed. That is the right proof shape for a concurrency fix: not one happy-path screenshot, but route identity, queue metrics, delivery counts, failure counts, and regression tests that fail on unpatched main.

Cancellation has ownership

PR #83827 is quieter but just as instructive. It detaches accepted queued user_request followups from the source-channel abort signal after they enter the queue, while keeping room_event followups cancellable. The old shape let queued Telegram topic user turns inherit a superseded source abort signal and die before draining. The before/after evidence is clean: old code plus new regression tests failed with two test files and two tests failing; the PR head passed 112 tests for the two-file command and 220 tests across four touched-surface files.

This distinction is exactly what agent runtime governance looks like in practice. Cancellation is not a boolean. A source reply fence can be superseded because a newer Telegram turn arrived, but user work already accepted into a queue should not automatically inherit that death sentence. Ambient room events are different. They can remain cancellable because they are not the same class of user-directed work. If that sounds pedantic, it is because reliable orchestration is mostly pedantry made visible.

Issue #83577 provides useful comparison context: a four-panel subagent roundtable previously lost 3 of 4 completion announcements because queue batching degraded under unresolved origins. That was not a Telegram forum topic bug, but it rhymes. The recurring problem is lane identity under concurrency. Who owns the message? Which route owns the reply? Which queue owns the work? Which abort signal is still allowed to cancel it? Which transport limit must be respected?

For builders, the takeaway is practical: test the channel shape your users actually use. Do not just send one Telegram DM and call the adapter production-ready. Send two forum-topic messages while another turn is active. Send media groups in separate topics. Trigger preview edits. Force queueing. Supersede a source reply while a user request is already accepted. Watch delivery metrics, not just final text. Multi-agent systems fail in the seams between identity, queue policy, cancellation, and throttling — the parts most demos carefully avoid.

The editorial line is simple: orchestration is not the architecture diagram where boxes run in parallel. It is the runtime proving that each lane keeps its identity when the chat system gets messy. Telegram topics are stress-testing OpenClaw because they expose the truth: without lane discipline, “parallel agents” degrade into serialized, dropped, or misdelivered messages with a nicer README.

Sources: GitHub PR #83829, PR #83827, PR #83790, issue #83577