A sessions_yield Compaction Bug Shows Why Multi-Agent State Needs Branch Discipline

The dangerous bugs in multi-agent systems are rarely the theatrical ones. They are the quiet state bugs: the parent session parks, the subagent finishes, the runtime wakes the parent, and somewhere in that handoff an internal marker starts acting like the real conversation.

That is the shape of OpenClaw issue #86684. A parent session using sessions_yield can be compacted during subagent completion or direct-announce handling even when visible context usage is only about 6%. The compaction entry is parented to hidden openclaw.sessions_yield custom-message entries rather than the visible user branch. The visible number is absurd on its face: a selected session showing 63k/1.1m tokens should not be hitting destructive compaction pressure.

The reported environment is concrete: OpenClaw 2026.5.22, git SHA ef17290, Linux 6.8.0-107-generic in Docker on an Ubuntu 24.04 host, using openai-codex/gpt-5.4 through the embedded Pi runtime and OpenAI Codex Responses API OAuth profile. The effective context window was about 1,050,000 tokens. Yet a compaction entry 9386dd5e was created with tokensBefore: 65320, fromHook: true, and a parent pointing at a hidden openclaw.sessions_yield custom message. A previous compaction showed the same pattern at 115283 tokens.

Context is not garbage collection

Compaction is often talked about like housekeeping: summarize old context, keep the session moving, avoid hitting the model window. That framing is too casual for agent runtimes. Compaction is a state mutation. It changes what the agent can remember, what future turns can cite, and what the operator can reconstruct after something goes wrong. Done correctly, it is a controlled compression checkpoint. Done incorrectly, it is data loss with a friendly name.

The labels on #86684 are appropriately serious: bug, regression, P1, impact:session-state, and impact:data-loss. ClawSweeper’s review did not dismiss the report. It noted that current source still leaves a plausible path where a hidden sessions_yield leaf can become the parent of a compaction entry during subagent completion handoff, even though a deterministic reproduction on current main still needs more work. That is exactly the class of issue that punishes real operators: timing-sensitive, expensive when it happens, and easy to miss in simple tests.

The gateway logs add another clue: [compaction-safeguard] Compaction safeguard: using session branch messages after compaction preparation omitted real conversation content. That is the kind of log line that should make maintainers sit upright. A safeguard had to compensate because the compaction preparation path omitted real conversation content. In a chat product, that is bad. In a multi-agent workflow runtime, it is worse because hidden coordination messages, user-visible turns, subagent outputs, and tool callbacks all share the same transcript machinery.

Hidden coordination messages must not own the branch

sessions_yield is a useful primitive. A parent agent can suspend while delegated work continues, then resume when a subagent completes. But that design inevitably creates internal messages: yield markers, wake-up artifacts, direct-announce attempts, completion metadata. Those artifacts are necessary for coordination. They should not become the branch root for destructive operations.

The difference matters because branch ancestry is semantic, not just structural. A visible user conversation has one meaning. A hidden yield marker has another. If compaction attaches to the hidden marker, the runtime may summarize the wrong branch, count the wrong tokens, or decide the visible conversation is less relevant than it is. That can inflate API cost through unnecessary compaction calls, but cost is only the easy symptom. The harder problem is semantic drift: the parent agent resumes with a compressed or incomplete view of the conversation that produced the delegation in the first place.

This is where workflow engines have a lesson for agent frameworks. Durable systems treat coordination state and user state differently. A queue message that wakes a worker is not the same thing as the customer record the worker is updating. A transactional outbox entry that records “delivery owed” is not the same thing as the delivered payload. Agent platforms need the same separation. Internal wake-up machinery should be auditable, but it should not accidentally become the conversation branch that future reasoning depends on.

What practitioners should watch

If you run OpenClaw with subagents, especially long-running delegated work, inspect compaction entries around sessions_yield. Compare visible context usage with compaction triggers. If you see compactions at single-digit context utilization, treat that as a runtime bug, not normal maintenance. Keep enough transcript retention to reconstruct the branch parent chain. Without that, you may only notice the problem later when an agent “forgets” the reason it delegated work.

Operators should also treat compaction as part of governance. A compaction event should have preconditions: context pressure, selected model window, parent branch identity, visible conversation inclusion, and an audit trail. It should ideally create a recoverable checkpoint before replacing raw context with a summary. The fact that related PR #86686 addresses adjacent budget handling — no-op compaction success, caller-resolved context budgets, and clamping to the selected compaction model’s window — reinforces the point. Budget math and branch math are runtime correctness, not polish.

There is also a design recommendation here for every agent platform, not just OpenClaw: do not store all transcript-like things as if they were equal. User turns, assistant turns, tool results, internal scheduler markers, subagent completion notices, compaction summaries, and delivery receipts need different roles and different authority. If the system insists on putting them in one event log, the branch rules must be explicit enough that hidden operational artifacts cannot hijack user-visible state.

The forward-looking take: multi-agent systems are becoming workflow engines whether they admit it or not. Once a parent delegates, suspends, resumes, compacts, and receives delayed child output, the platform is managing stateful distributed execution. That demands branch discipline. Internal wake-up machinery can help the runtime coordinate. It must never become the thing that owns user context.

Sources: OpenClaw issue #86684, OpenClaw PR #86686, LangGraph persistence docs, Transactional outbox pattern