OpenClaw Webchat Has a Message-Loss Bug Where the Turn Disappears

The scariest agent failure is not a red error banner. It is a turn that looks accepted, runs real tools, hangs after the tool result, and then vanishes from the transcript as if the user never asked. That is the failure mode described in OpenClaw issue #86895, and it is a sharp reminder that conversational UX does not exempt agent platforms from write-ahead logging.

The report targets OpenClaw v2026.5.22 on macOS launchd gateway, using the local Claude CLI runtime with claude-opus-4-6. A webchat user asks for a long-form article summary. ToolSearch succeeds. WebFetch succeeds with is_error:false. Then the post-tool generation phase emits no progress for roughly 365 seconds. The stuck-session watchdog warns every 30 seconds, then aborts the embedded run at about 366 seconds with AbortError.

That watchdog behavior is not the bug. It is the part that worked. The ugly part is what happens next: because CLI turns are persisted only after the agent attempt completes successfully, the substantive user request leaves no durable transcript entry. The user sees a webchat lane that can answer trivial prompts but silently loses real tool-backed work. That is not just bad UX. It is observability failure dressed as chat.

The host was healthy, which makes this harder

The report does useful diagnostic work by ruling out the easy explanation. During the hang window, the gateway diagnostic timer fired exactly on schedule every 30 seconds, and no eventLoopMax spike appeared. In other words, the host event loop did not wedge. OpenClaw remained healthy enough to detect that the embedded CLI run had gone silent and to abort it on schedule.

The strongest comparison is the controlled repro detail: same URL, same prompt string, same model, same build, same auth path. The long-lived ...:main webchat session produced 365 seconds of zero progress and aborted. A fresh isolated session completed with 166 seconds of steady progress and a real reply. A large Wikipedia summarization control completed in 55.661 seconds.

That comparison moves the likely blast radius away from the source URL and toward session state. Long-lived agent sessions are where invariants decay: malformed tool pairs, stale reasoning blocks, replay mismatches, missing user entries, compaction boundaries, and raw CLI history all become suspects. If the same task succeeds fresh and stalls in the main session, “the model was slow” is not a satisfying diagnosis.

Issue #86592 explains why the failure becomes invisible. role:"user" entries are written to session JSONL only after the agent attempt completes successfully. If the attempt throws, the accepted prompt is not appended. That design is convenient until a tool-backed turn fails after doing work. Then the platform has no durable record of the user’s intent, only logs for operators willing to go spelunking.

Agents need request logs, not success-only memories

Every web engineer already knows this pattern in another domain. You do not log only successful HTTP requests. You do not persist only completed jobs. You do not enqueue a background task only after the worker finishes. Agent platforms should not get a pass because the interaction ends in prose. If anything, agent turns need stricter traceability because their failure states are less obvious and often span model calls, tool calls, replay, compaction, and channel delivery.

The operational fix is conceptually boring: persist the accepted user turn before the model attempt, or append it in a failure-safe block with an error stub that says the attempt aborted. The transcript should show “user asked X; runtime attempted; tools succeeded; generation stalled; watchdog aborted.” That record is valuable even if the final answer never arrives. Without it, users and operators are left to infer from absence.

For teams running OpenClaw, the immediate advice is to instrument for missing turns, not just failed turns. Alert when a user message is accepted by webchat but no corresponding transcript entry appears after an abort. Watch specifically for watchdog aborts following successful tool results. Treat “webchat only handles trivial prompts” as a possible symptom of lost substantive runs, not as impatience or model weakness. If you can reproduce the issue, preserve raw session JSONL before using /new or resetting; the evidence is likely in the accumulated session state.

For OpenClaw maintainers, the post-tool stall and the persistence gap should be handled as related but separable defects. The stall is intermittent and may require deep harness/session replay work. The transcript persistence behavior is source-visible and should be fixed regardless. A watchdog that aborts a silent run is useful. A watchdog that aborts and leaves no transcript is a black hole with better logging.

There is also a product trust angle. Users can forgive a visible failure: the agent tried and failed. Silent loss is different. It breaks the operator’s mental model because the system accepted work, performed some of it, and then erased the evidence from the place users expect to look. That is how an agent platform starts to feel haunted.

The lesson goes beyond webchat. Multi-agent systems, background media tasks, coding harnesses, and long-lived sessions all need write-ahead semantics for user intent. If the source article disappears, the source URL changes, or the model hangs after tool replay, the platform should still remember the work it accepted. Otherwise, observability is optional only until you need it.

Sources: GitHub issue #86895, issue #86592, OpenClaw v2026.5.22 release, issue #86239