OpenClaw’s 120-Second Bedrock Fence Failure Is a Provenance Bug, Not a Timeout Bug

OpenClaw’s 120-Second Bedrock Fence Failure Is a Provenance Bug, Not a Timeout Bug

The most frustrating agent-runtime bugs are the ones where the platform catches itself doing something suspicious and then punishes the user for it. OpenClaw issue #89259 is in that category: long Amazon Bedrock streaming runs can die around the 120-second mark with EmbeddedAttemptSessionTakeoverError, after the tools have already run and the model is writing the final response. The runtime appears to see its own Bedrock assistant stub as a possible foreign transcript takeover.

That is not a cosmetic failure. Session takeover fences are one of the few hard safety boundaries an agent platform has. If a transcript changes while a prompt lock is released, OpenClaw needs to know whether the change came from the active run, a sibling run, a delivery mirror, a stale process, a webhook lane, or an external edit. Accept too much and you invite cross-session contamination. Reject too much and legitimate long-running work disappears after the expensive part has already happened.

The report is unusually precise. The affected environment was OpenClaw 2026.5.22 on Linux x64 with Node 24.14.0, using @openclaw/amazon-bedrock-provider 2026.5.22 and model amazon-bedrock/zai.glm-5 through bedrock-converse-stream. The affected lanes included Slack DM socket-mode sessions, a GitHub webhook route to /hooks/github-engineering, cron-nested isolated agentTurn calls, and a named Slack thread session lane. This is not a toy repro sitting in a local shell. It is the kind of path operators actually use.

The pattern is also consistent. A long Slack DM runs multiple exec and gh tool calls. After the last tool result, the model streams a long final reply. Roughly two minutes later, OpenClaw writes an empty assistant entry with provider amazon-bedrock and model zai.glm-5, then throws EmbeddedAttemptSessionTakeoverError. Two observed failures in the same session line up with the timing: one from last tool result at 21:44:02.xxx to throw at 21:46:02.573, duration about 129017ms; another from 22:02:48.xxx to 22:04:48.944, duration about 151455ms.

A whitelist is not ownership

The source-level clue is the hard-coded benign-model whitelist. The report traces behavior to TRANSCRIPT_ONLY_OPENCLAW_ASSISTANT_MODELS, a set containing internal mirror models such as delivery-mirror and gateway-injected. If an assistant entry appears while the prompt lock is released, OpenClaw can treat those known internal mirrors as benign. A Bedrock provider/model line is not on that list, so it looks foreign.

The problem is that model identity is the wrong proof. The runtime does not need to know whether zai.glm-5 is generally benign. It needs to know whether this specific write was produced by the still-active run, at the expected boundary, with the expected run id, session key, transcript fingerprint, and lock lifecycle. Ownership should be narrow and auditable. A whitelist of model names is a proxy, and proxies eventually fail in exactly the places operators care about: provider adapters, streamed terminal events, failure stubs, and long-running replies.

This bug rhymes with earlier OpenClaw session-fence work. Issue #85826 covered a hard-coded 120-second stall detector killing legitimate long local vLLM calls. PRs #86584 and #87159 worked on owned-write and session-file ownership fixes. Issue #89256 covered a session-key-bound isolated cron variant and was closed as already implemented for current main and v2026.5.28. #89259 stays interesting because the Bedrock assistant-stub path is not proven fixed.

The lesson is not “turn off the fence.” A broad fenceMode="warn" escape hatch may be useful for emergency recovery, but it cannot be the design center. Transcript takeover detection is there because agent platforms have real cross-lane hazards. Slack threads, cron jobs, webhooks, local sessions, and subagents can all touch adjacent state if identity and ownership are weak. The fix should be to prove ownership better, not to trust more writes because a provider is popular.

This is also a cost-control bug

It is tempting to classify #89259 as reliability only. That undersells the operational damage. A long model stream is expensive, especially on high-context or slower enterprise providers. If the runtime kills the turn after tool work completes and after the model has spent two minutes composing, it wastes token spend, user time, and trust in the platform. The answer never lands, but the billable work already happened.

For teams running Bedrock, vLLM, or other slower providers, observability should include more than “request started” and “request failed.” You want active work kind, last tool-result timestamp, stream-progress heartbeat, lock release and reacquire times, transcript fingerprint changes, provider and model, run id, lane identity, and whether a runtime-owned publication was accepted or rejected by the fence. Without that, a two-minute failure looks like model flakiness, provider latency, Slack delivery weirdness, or user cancellation depending on which log you happen to inspect.

For platform authors, the durable fix is run-scoped ownership metadata. Mark runtime-owned failure stubs, streaming partials, delivery mirrors, and provider assistant entries with run identity. Accept only expected writes from the expected run. Fail closed on unowned edits. Add regression fixtures for direct request failures, streamed terminal-event shapes, and long-provider replies where the lock legitimately changes state. That is more work than adding amazon-bedrock to a benign list, but it scales. The whitelist patch will always be waiting for the next provider edge case.

For operators, the practical advice is to treat unexplained 120-second assistant-turn failures as runtime-fence candidates, not just provider instability. Preserve logs with timestamps around the last tool result and the throw. Capture provider/model metadata and lane type. If you are tuning timeouts, avoid masking takeover errors by simply making the runtime more permissive. A passing long stream is good; a weaker fence that allows cross-session writes is not.

The editorial read is simple: provenance beats whitelists. If OpenClaw can prove a write belongs to the active run, it should accept it. If it cannot, the fence should fire. But the proof has to be ownership, not a hard-coded list of model names that happens to work until Bedrock, vLLM, or the next streaming adapter walks through the door.

Sources: GitHub issue #89259, GitHub issue #89256, GitHub issue #85826, OpenClaw PR #86584, OpenClaw PR #87159