Agent Teams Needs to Stop Calling Successful Subagents Failed Just Because the Sweeper Got There First

Agent Teams Needs to Stop Calling Successful Subagents Failed Just Because the Sweeper Got There First

Multi-agent systems are sold as planning, delegation, and parallel work. In production, they mostly become bookkeeping problems with better marketing. Did the child agent start? Did it finish? Did it produce usable output? Did the parent receive that output? Did a cleanup sweeper decide the active execution context was stale before the completion path had a chance to reconcile state? Those are not glamorous questions. They are also the difference between “agent team” and “expensive group chat with race conditions.”

OpenClaw PR #90492, opened June 5 at 00:45 UTC, addresses exactly that class of failure. The patch fixes an Agent Teams lifecycle mismatch where a parent can receive failed: subagent run lost active execution context even though the same completion event contains readable child assistant output. The proposed behavior is straightforward: before the lost-context sweeper emits a terminal failure, reconcile captured child output. If the stale active run has real assistant output, complete it as ok. If it does not, keep reporting subagent run lost active execution context.

The linked issue, #90299, makes the bug painfully concrete. A parent task named task_agent_teams_release_readiness spawned subagent agent:bench-model_square-doubao-seed-2-0-pro-260215:subagent:a29ccae0-fd1a-43c1-83cf-5af052f1010b. The parent received a failed status saying the subagent lost active execution context. In the same event, there was a long non-empty child result beginning with # ARCHITECTURE.md. A human can squint at that and recover. A parent agent may discard the output, retry the work, report failure, or route the job to another subagent. That is how state bugs become cost bugs.

The dangerous state is contradictory truth

The PR is intentionally narrow: four files changed, +130/-6, touching subagent-lost-context-completion.ts, its tests, subagent-registry.ts, and registry tests. It does not introduce a new shared-memory layer, change the Agent Teams plugin, or redesign the announce format. That restraint is good. The invariant is not “invent a more ambitious orchestration architecture.” The invariant is “if useful child output exists and can be delivered to the parent, do not mark the run failed merely because the active execution context went stale.”

The author-listed verification is also appropriately targeted: node scripts/run-vitest.mjs src/agents/subagent-lost-context-completion.test.ts plus a focused src/agents/subagent-registry.test.ts run. ClawSweeper reviewed the patch at 00:47 UTC, called the issue source-reproducible, and still blocked merge pending stronger proof. Its concern is the right one: recovery should require real visible child result text, not just any transcript artifact. Tool-call-only history should not count as success. Output lookup failure should not become optimistic completion. The predicate matters because an ok status is a product promise.

This is the hard part of agent orchestration: terminal states must match delivered work, not internal hopes. A subagent may have run tools successfully but produced no final answer. It may have written scratch content that is not suitable for the parent. It may have emitted partial logs, hidden reasoning, or tool results that require interpretation. If the parent receives ok, the runtime must be able to point to usable assistant output that satisfies the handoff contract. Otherwise the system has merely traded false failure for false success.

Sweepers should clean up state, not rewrite history

The lost-context bug sits in a broader OpenClaw pattern. PR #90065 recently addressed session write-lock failure after provider hangs by bounding abort-path lock release. The June 4 v2026.6.2-beta.1 release leaned hard into install policy, runtime bounds, and control-plane hygiene. These are all versions of the same lesson: once agent work becomes asynchronous, long-lived, and multi-party, cleanup paths are not background chores. They are part of correctness.

A sweeper is supposed to prevent orphaned work from poisoning the system. But if it races a legitimate completion path, it can create contradictory truth: the child produced output, the registry no longer thinks the active run is valid, and the parent receives both a failure status and the answer. That is worse than a clean failure because it makes automated recovery ambiguous. Should the parent retry? Should it trust the payload? Should it mark the release-readiness task blocked? Should it summarize the child output anyway? The platform should not push those semantics onto the model.

For builders using Agent Teams, the action item is to test lifecycle edges directly. Use sessions_spawn and sessions_yield in scenarios where the parent is delayed, the child completes near a cleanup boundary, and the registry sees stale active state. Assert three facts together: terminal status, delivered visible text, and retry behavior. A system that only asserts “event emitted” is not testing orchestration. It is testing that something made noise.

Operators should also watch for cost amplification. False failures trigger retries, fallback agents, duplicate searches, repeated compaction, and human intervention. In a multi-agent stack, one bad terminal state can multiply spend because the parent believes useful work was lost. The problem is not just user frustration; it is budget drift caused by lifecycle accounting that does not reflect reality.

The design standard should be blunt. A subagent is ok only when the parent has usable output. A subagent is failed when the runtime cannot deliver usable output or cannot prove the output belongs to that run. A subagent is ambiguous only if the UI and API expose ambiguity explicitly, not by smuggling both failure and answer into one event and hoping the next model turn has good instincts.

PR #90492 is a small state-machine patch with a large systems lesson. Multi-agent reliability is not about smarter prompts. It is lifecycle accounting: preserving identity, reconciling output, sequencing cleanup, and refusing to let a sweeper turn successful work into failure just because it got there first.

Sources: OpenClaw PR #90492, OpenClaw issue #90299, OpenClaw PR #90065, OpenClaw v2026.6.2-beta.1