OpenClaw’s sessions_spawn Fix Is a Small Patch for a Big Multi-Agent Truth

Chatbot-era software taught us to expect an assistant to answer with text. Multi-agent software keeps punishing that assumption. OpenClaw’s `sessions_spawn` fix is a useful little reminder: sometimes the correct parent-agent response is silence, because the real output is a child session that has already been accepted and started.

That sounds obvious once stated. It was not obvious to the runtime.

PR #85135, created on May 22, repairs a false-failure path in OpenClaw’s embedded runner. The bug appeared when a pure-relay agent successfully accepted a `sessions_spawn` child session but produced no parent text afterward. The old completeness check looked at the empty parent turn and concluded the agent had failed to generate a response. Users could see `Agent couldn't generate a response` even though the dispatcher had done exactly what it was built to do: spawn the child and stop talking.

This is not just a cosmetic bug. The older issue behind the fix, #72541, describes real false-failure instances from April 22 involving a relay agent named Claudette. Its job was simple: every received message triggered one `sessions_spawn runtime=acp` call and then ended. Seven observed false failures produced retry cascades through an architecture-audit dispatch script, duplicate Claude Code sessions, and at least one misleading fallback audit filename. A bad completion heuristic turned successful delegation into duplicate work.

A handoff is an output

The replacement PR is merge-ready repair work for source PR #85054. It touches 24 files, including `src/agents/accepted-session-spawn.ts`, embedded runner fallback classifiers, incomplete-turn tests, subscribe handlers, lifecycle handling, and changelog surfaces. The core idea is not complicated: carry a structured accepted-spawn fact from tool completion through the embedded runner and fallback/replay classifiers.

The fix does not declare every empty parent turn successful. That would be the lazy patch, and it would hide real failures. Instead, a valid accepted `sessions_spawn` result now requires `status:"accepted"`, a non-empty `runId`, and a non-empty `childSessionKey`. That evidence suppresses only the false incomplete-turn path and prevents duplicate or replay fallback paths. Failed spawns, malformed spawns, parent prompt timeouts, messaging delivery errors, and unrelated tool side effects keep their existing safety behavior.

That distinction is the whole story. Runtime classifiers should not infer success from vibes. They should consume durable facts emitted by tool contracts. If the child session exists, the parent accepted the handoff, and the tool returned a run ID and session key, the parent’s lack of prose is not a failure. It is the designed terminal state.

This matters because multi-agent orchestration breaks the single-assistant mental model in several places at once. A content agent probably should produce content. A review agent might produce comments, a patch, or a refusal. A monitoring agent might record state and only notify on drift. A dispatcher agent might emit no user-facing text at all because its entire job is to route work to a specialized child. One global completeness rule cannot represent all of those roles without either hiding failures or retrying success.

Retries are not free in agent systems

False failure is more expensive in multi-agent software than in ordinary chat. In a chatbot, a failed turn might mean the user resends a message. In an orchestrator, a failed turn can spawn a second background agent, rerun a build, duplicate a pull request review, reopen a browser workflow, or create two competing artifacts with similar names. The #72541 report is a clean example: duplicate Claude Code sessions and misleading audit outputs from a dispatcher that had already handed off the work.

That is why “Agent couldn't generate a response” is a dangerous fallback message when the platform has not checked whether a non-textual success occurred. It trains users and supervisors to retry, and retry is a write operation in disguise. It may spend money, mutate files, send messages, run tools, or reserve external resources. A platform that cannot distinguish “no text because success was a handoff” from “no text because the model wedged” will eventually do the wrong thing automatically.

The proof discipline around this patch is encouraging. The source PR used QA Lab on macOS with a real gateway child, the qa-lab bus, and a mock OpenAI provider. Verification covered `runtime-tool-sessions-spawn` and `subagent-handoff`. Automation later ran focused Vitest suites plus `pnpm check:changed`, `pnpm lint`, and `pnpm check:test-types`. That may sound like release-process trivia, but it matters for orchestration bugs. The only convincing test is one that exercises the actual handoff path, not a unit test that merely asserts an empty string is acceptable.

For practitioners building on OpenClaw, the operational takeaway is to treat terminal success as role-specific. If you have dispatcher agents, define the success artifact: accepted child run, queued task ID, persisted ticket, sent message receipt, or whatever the role actually produces. Then make the runtime check that artifact explicitly. Do not require a parent agent to narrate “I spawned the child” just to satisfy a chatbot-era completeness checker. That text is mostly for humans; the runtime needs a contract.

For teams building their own agent platforms, the pattern generalizes. Every tool that can create durable downstream work should return a typed fact that fallback, replay, and supervisor layers understand. `sessions_spawn` should return accepted child metadata. Message tools should return delivery receipts or explicit best-effort status. Build tools should return artifact IDs. Refusal paths should return policy facts. The more outcomes become explicit, the less the platform has to guess from assistant prose.

This also connects to OpenClaw’s recent lane and self-healing work. The project has been discovering, repeatedly, that “completed” is not a boolean. It can mean a channel recovered, a child accepted, a message delivered, a task recorded, a tool denied safely, or a model refused correctly. Treating all of those as one generic end state is how agents become flaky in production even when each individual tool seems fine.

The editorial point is small but durable: silence can mean success, but only if the runtime has proof. Multi-agent systems should not make parent agents write ceremonial status prose to appease a completeness heuristic. They should carry the handoff fact, suppress the false fallback, and let the child do the work. Looks boring. Ships better.

Sources: OpenClaw PR #85135, OpenClaw PR #85054, OpenClaw issue #72541, OpenClaw PR #84752