openclaw

OpenClaw’s Hook-Runner Regression Shows Why Hot Reload Is a Production Feature, Not Dev Sugar

Anatoliy Kolodkin

24 May 2026 • 4 min read

Hot reload is one of those features people mentally file under developer convenience until it breaks in a system that is supposed to keep running. Then it becomes lifecycle management. OpenClaw issue #86241 is a useful reminder: in an agent runtime with plugins, subagents, mobile clients, auto-updates, and persistent transcripts, restart behavior is not dev sugar. It is production infrastructure.

The reported regression is specific but familiar. In OpenClaw 2026.5.22, preserveGatewayHookRunner can suppress initializeGlobalHookRunner(registry) after subagent hot-reload cycles. The result: plugin handlers registered with api.on('before_prompt_build', ...) can stack instead of replacing the prior handler. After N restart cycles, prompt hooks fire N times. In the customer-facing report, connected mobile clients received four to six duplicate copies of every agent message after an overnight auto-update triggered a rapid restart sequence.

That sounds like a duplicate-notification bug. It is worse than that.

A prompt hook is not just a listener

The issue was filed against OpenClaw 2026.5.22 on macOS 26.3 arm64, installed globally through npm, using Anthropic Sonnet 4.6 and a plugin registering a before_prompt_build hook. The reporter says 2026.5.19 was confirmed working. The suspected root cause is in loader-CZB9kQVT.js around line 4949: preserveGatewayHookRunner evaluates true when runtimeSubagentMode === "default", the active plugin runtime mode is gateway-bindable, and a global hook runner already exists. That path skips reinitializing the global runner.

The intent is understandable. Preserving a global runner can avoid tearing down useful state during subagent transitions. But lifecycle guards are sharp tools. Preserve too little and you break continuity. Preserve too much and old handlers remain alive while new ones register. In a normal web app, that might mean duplicate event listeners. In an agent runtime, it can mean duplicated prompt injections, duplicated message deliveries, duplicated tool-side effects, and increasingly distorted context.

A before_prompt_build hook is especially sensitive because it changes what the model sees before each turn. If the hook injects context once, the model receives one copy. If the hook stacks six times after restarts, the model may receive six copies. That can waste tokens, bias outputs, cause repeated instructions, or trigger repeated downstream behavior. The user symptom might be duplicated mobile messages, but the runtime symptom is uncontrolled multiplication of pre-model side effects.

The workaround tells you what kind of bug this is. The reporter pinned to 2026.5.19 and added a plugin-side globalThis[BEFORE_PROMPT_KEY] guard to tear down the previous handler before registering again. That is a reasonable emergency patch for a plugin author, but it should not be the long-term contract. Handler replacement and idempotent registration need to be runtime-level guarantees, because not every plugin author will implement the same defensive pattern correctly.

Restart cascades are not hypothetical

The same issue bundles a second concern: session durability. The reporter alleges that .jsonl session files can be truncated if SIGTERM lands mid-write during rapid restart cascades. Two named agent sessions allegedly corrupted during the 2026.5.22 restart cascade and required manual reset. ClawSweeper correctly warned that the hook-runner problem and transcript durability problem should probably be split for focused reproduction. But operationally, they rhyme. Both are failures at the lifecycle boundary.

Agent runtimes restart for boring reasons: auto-updates, LaunchAgent supervision, Gateway reloads, subagent completion, plugin reload, process crashes, operator restarts. The fact that a restart path is “rare” in a unit test does not make it rare in production. Any system that runs continuously and updates itself will eventually collide with writes, hooks, sockets, and client delivery.

If transcript persistence is truncate-first rather than temp-write/fsync/rename, a mid-write SIGTERM can convert a routine restart into permanent context loss. That is database 101, but agent platforms have been relearning database lessons one JSONL file at a time. Conversation history is not cosmetic. It is memory, audit trail, replay material, and often the only human-readable record of why the agent took an action. Losing it during an auto-update is not a UX wart; it is operational data loss.

The maintainer posture around #86241 is appropriately cautious. The issue carries P1, impact:session-state, impact:data-loss, impact:message-loss, and a high-value issue-rating label. ClawSweeper kept it open, noted that the current source still contains the hook-runner preservation path, and found separate non-atomic/direct transcript write paths. It also declined to pretend a generated-asset patch was sufficient proof. PR #86248 opened minutes later, but it touched a generated viewer runtime asset with 21,723 additions and 162 deletions, carried size: XL, and still needed real behavior proof. Good. This is not the kind of bug you squash with a giant generated diff and vibes.

What practitioners should change

If you run custom OpenClaw plugins on 2026.5.22-era builds, watch for repeated prompt-build side effects after subagent runs, hot reloads, or auto-updates. Duplicate outbound messages are the visible symptom. Repeated context injection is the quieter one. Add plugin-side idempotency guards where you can, especially around api.on(...) registrations that mutate context or trigger external behavior.

Also treat session JSONL corruption as an incident, not as cleanup. Preserve the damaged file, collect restart and signal logs, and avoid resetting the session until maintainers can inspect the failure shape. If the platform is truncating before writing, maintainers need exact artifacts to prove and fix it. If the report is wrong, the artifacts will show that too.

The broader engineering lesson is simple: agent lifecycle is infrastructure. Hook replacement, hot reload, atomic transcript writes, and duplicate-delivery suppression are not optional polish. They are the mechanisms that keep a multi-agent system from multiplying side effects every time it restarts. OpenClaw is growing into a runtime where “just restart it” is no longer a harmless sentence.

Sources: OpenClaw issue #86241, OpenClaw PR #86248, OpenClaw issue #72536, OpenClaw issue #72176

A prompt hook is not just a listener

Restart cascades are not hypothetical

What practitioners should change

Sign up for more like this.