OpenClaw’s Subagent Completion Regression Shows Multi-Agent Orchestration Is Still a Delivery Problem
Multi-agent orchestration is usually marketed as decomposition: split the work, spawn specialists, collect results. That story skips the part operators actually live with, which is delivery. A delegated task that finishes but never reports back is not automation. It is a very polite way to lose state.
OpenClaw issue #82370, opened May 16, is a clean example. After upgrading to 2026.5.12, a user reports that subagent completion announcements stopped reaching the requester session. The child run completes. sessions_history can prove the work is done. But after the parent calls sessions_yield, no completion handoff arrives. The announce path retries three times, gives up with retry-limit, and logs the important line: active requester session could not be woken.
That phrase should make anyone building agent systems pause. Completion is not a cosmetic notification in a multi-agent workflow. It is part of the control plane. It tells the parent that a dependency is satisfied, gives the human evidence that delegation worked, and often becomes the trigger for the next planning step. Lose that event and the child’s output becomes orphaned state: technically recoverable, operationally invisible.
“Active but not streaming” is not a delivery strategy
The report is unusually useful because it traces the suspected path rather than just saying “it broke.” The environment is OpenClaw 2026.5.12 on macOS ARM64 with Node.js v24.13.1, using the Feishu channel. The reproduction is normal usage: start a channel conversation, call sessions_spawn with mode: "run", call sessions_yield, let the subagent finish, and wait for the completion message that never appears.
The logs show Subagent announce give up (retry-limit) with three attempts ending in about five seconds. The reporter’s read of subagent-announce-delivery-DzsdC5tX.js is the key: runSubagentAnnounceDispatch chooses a direct path because expectsCompletionMessage=true. It sees requester activity as active, tries resolveQueueEmbeddedPiMessageOutcome() to wake the requester session, and fails because the requester is active but not streaming after sessions_yield ends the current run, or in the timing gap between runs. Then, because requesterActivity.isActive === true, the direct path returns instead of falling through to queued delivery.
That is a classic distributed-systems bug wearing an agent-platform hoodie. The state classifier says the parent is active. The message path assumes active means wakeable. The wake fails. The fallback queue never gets the event because the active classification short-circuited the dispatch. Nothing is “wrong” locally, and the user still never sees the result.
The human reaction in the issue is direct and relatable: “麻烦你了claw,我在主会话派出子代理后,子代理完成任务后不会回复” — roughly, “Please help, Claw; after I dispatch a subagent from the main session, it doesn’t reply after completing the task.” That is the user-facing version of the control-plane failure. They did the right thing. The child did the work. The system lost the handoff.
Plugin externalization made delivery paths more honest
The timing matters. This lands near other OpenClaw delivery regressions around the 2026.5.12 line, including issue #82360 on isolated cron Slack announcements failing after Slack plugin externalization. PR #82371 addresses cron delivery target resolution by allowing explicit plugin bootstrap when the loaded-plugin fast path misses. Older Feishu work in issue #77712 and PR #78809 points at the same architectural pressure: once channels become externalized plugins, every outbound delivery path has to know how to bootstrap the right plugin instead of assuming it is already resident.
That does not mean plugin externalization was a mistake. Pulling heavy channel and provider stacks out of core is the right long-term direction for an agent runtime that wants to be maintainable. But it changes the invariants. Previously, sloppy delivery code could sometimes survive because everything was already loaded. In a plugin-first architecture, routing becomes explicit. A completion announcement to Feishu, a cron result to Slack, and a subagent handoff to a parent session all need durable addressing, plugin bootstrap, retry semantics, and a fallback queue that does not depend on lucky runtime state.
For operators, the practical advice is simple: if you rely on sessions_spawn plus sessions_yield in channel sessions, verify completion independently until this class of issue is fixed in your deployment. Check sessions_history or session status before assuming a silent child failed. For cron and channel-delivered automations, pay attention to delivery-target fixes as much as model or tool fixes. A perfect answer that never reaches the requester is still a failed workflow.
For framework builders, the design lesson is even cleaner. Multi-agent systems need durable completion events with at-least-once delivery semantics and idempotent rendering. “Active requester session could not be woken” should be a routing condition, not a terminal error. Queue it for the next requester run. Put it in a visible pending-completions inbox. Mark it delivered only after a channel renderer succeeds. Do not let an ambiguous active-state check decide whether the user ever hears back.
The editorial take: the multi-agent future will not fail because agents cannot split tasks. They can already split tasks. It will fail when the parent never hears back, nobody notices, and the operator has to spelunk session history like it is a crash dump. Orchestration is delivery semantics. Everything else is the demo.
Sources: OpenClaw issue #82370, OpenClaw PR #82371, OpenClaw issue #82360, OpenClaw PR #78809