OpenClaw Stops Letting Aborted Subagents Pretend They Succeeded
A child agent that aborts did not “complete with no message.” It failed. The parent deserves to know.
That is the plain-language contract behind OpenClaw PR #85860, which fixes a multi-agent lifecycle bug where a subagent ending with stopReason: "aborted" could still be normalized into an OK completion if the older aborted: true flag was absent. The user-visible failure is nasty: a Codex GPT-5.5 subagent aborts mid-response, emits empty content, gets marked done, and the parent waits forever because no useful auto-announcement arrives.
That is not just a bug in subagent messaging. It is orchestration lying about state.
Empty output is not the same as success
The linked issue #72293 includes the sort of forensic timeline maintainers should want. On April 25, a subagent spawned at 20:28:01 AST. Its last successful tool call landed at 20:44:21. At 20:44:22.570, Codex emitted openclaw:prompt-error. The final assistant message at 20:44:22.614 had content: [], stopReason: "aborted", errorMessage: "This operation was aborted", and usage.totalTokens: 0. Two seconds later, gateway status became done with abortedLastRun=false.
That sequence tells you the bug. The runtime had enough evidence to classify the run as aborted, but one compatibility seam allowed the terminal state to look like a successful no-op. In multi-agent systems, that is poison. The parent does not need eloquent prose from a failed child. It needs a typed failure event so it can retry, escalate, summarize, or tell the user the work did not complete.
PR #85860 addresses that by making aborted terminal states propagate as errors across multiple boundaries: gateway wait snapshots, dedupe cache, run-wait normalization, subagent registry, lifecycle handlers, and announcement output. The patch touches 13 files with 354 additions and 16 deletions. That breadth is the point. If only the announcement UI learns that an abort is failure, a cached wait result can still reintroduce false success. If only the registry learns it, the parent may still miss the event. Terminal-state classification has to be consistent everywhere the runtime observes, stores, dedupes, and announces work.
The before/after proof is blunt. Before the fix: input.lifecycle.stopReason=aborted, registry.endedReason=subagent-complete, registry.outcome.status=ok, parentAnnouncement.outcome.status=<missing>, and result=LEGACY_FALSE_SUCCESS. After the fix: registry.endedReason=subagent-killed, registry.outcome.status=error, registry.outcome.error=subagent run terminated, parentAnnouncement.outcome.status=error, and result=TERMINATED_FAILURE.
Multi-agent systems need typed failure semantics
This is one of those issues that seems narrow until you compare it with how production distributed systems work. A worker can succeed, fail, timeout, cancel, abort, be killed, be skipped, or still be running. Those states are not synonyms. Job queues, workflow engines, CI systems, and container schedulers all learned this because downstream behavior depends on it. Agent platforms now have to learn the same thing while pretending the interface is just chat.
The chat illusion is especially dangerous here. In a normal chat session, an empty assistant message can be dismissed as a bad response. In a multi-agent runtime, the empty message may be the only user-facing symptom of a failed background worker. If the parent agent receives no structured failure, it can continue planning from a false premise: the child finished, or at least did not fail loudly. The human sees silence. The transcript offers archaeology. The orchestration graph has already lost the plot.
That matters for comparisons between OpenClaw, Claude Code, Codex, Copilot cloud agents, Gemini CLI, and every other coding-agent stack moving toward background workers. The real question is not “can it spawn a subagent?” Demos can do that. The question is what happens when the subagent dies halfway through a tool call, after partial context, with no content, during compaction, after a provider-side abort. Does the parent receive an actionable terminal state, or does the failure become a blank line and a support thread?
PR #85860 is still draft at the research snapshot, labeled as needing proof despite supplied proof. That status matters. This is not shipped gospel. But the direction is correct, and the issue’s label stack — P1, impact:session-state, impact:message-loss, source reproduction, and a high issue rating — matches the severity. A parent session black-holing child failure is not a UI papercut. It is a runtime contract violation.
For practitioners, the action item is to audit your own mental model of agent delegation. If you use OpenClaw subagents for real work, inspect how failures are represented to the parent. Watch for empty child outputs, runs marked done after provider errors, missing auto-announcements, and parent turns that wait on a result that never arrives. If you build on top of OpenClaw APIs, do not treat absence of content as absence of error. Check lifecycle fields, stop reasons, and outcome status explicitly.
There is also a product-design lesson. Parent agents should be able to reason over failure as data. “The research subagent aborted after its last tool call” is useful. “No message” is not. Once agent frameworks make delegation easy, they must make failure equally visible. Otherwise multi-agent orchestration becomes a confidence trick: lots of workers, little accountability.
The editorial take is strict because the bug class deserves it. A child agent that aborts has failed the assigned work, even if it produced no text and consumed no final tokens. The runtime should say so loudly, consistently, and in a shape the parent can act on. Multi-agent systems do not need more optimistic normalization. They need honest terminal states.
Sources: OpenClaw PR #85860, OpenClaw issue #72293, OpenClaw PR #85651, OpenClaw issue #85712