openclaw

A Closed OpenClaw Compaction Bug Shows Why Runtime Contracts Need Typed Semantics

Anatoliy Kolodkin

27 May 2026 • 3 min read

OpenClaw issue #87091 is closed, which is good for users on current main. But it should not be mentally closed for anyone operating agent infrastructure. The bug is a near-perfect miniature of how agent runtimes fail when plugin contracts are defined by vibes instead of typed semantics.

The report came from an OpenClaw 2026.5.22 setup using the lossless-claw context engine as plugins.slots.contextEngine. The runtime decided preflight compaction was required before answering a Telegram/shared GUI session. It delegated to the configured context engine. The engine returned a successful no-op: ok=true, compacted=false, reason=below_threshold. That should mean: nothing needs to be compacted because the conversation is already below the threshold. Instead, the released OpenClaw path treated the no-op as fatal and surfaced errors like Preflight compaction required but failed: below threshold and Preflight compaction required but failed: below_threshold.

That is a tiny boolean interpretation bug with a very real user-facing result: replies stopped.

Compaction is now part of message delivery

It is tempting to file this under “context-engine integration edge case” and move on. That would miss the point. Compaction used to be an internal maintenance detail: trim a conversation, summarize old state, keep the model inside the window. In modern agent systems, compaction sits on the critical path for message delivery. If the runtime believes it cannot safely fit the next turn, it may block the reply until compaction succeeds. That means compaction contracts are now delivery contracts.

Users do not care whether Telegram broke, the model failed, or a context engine returned a harmless no-op that the host misread. They see the agent not answering. Operators need traces that explain the chain: who decided compaction was required, which engine handled it, what result shape came back, whether the engine changed the conversation, how the token budget looked afterward, and why the runtime proceeded or stopped.

In this case, the contract should have been unambiguous. ok=true is success. compacted=false is a state transition of “no change,” not a failure. reason=below_threshold is not an error condition; it is the exact reason no compaction was needed. A host runtime that treats that as fatal is not being conservative. It is rejecting valid plugin output because the semantic model is underspecified.

ClawSweeper closed the issue because current main already returns successfully for successful no-op compaction results while still treating deferred or real failures as fatal. That is the correct behavior. But the episode remains useful because it reveals the next class of bugs agent platforms are walking into: not “can the plugin run?” but “does the host and plugin agree on what the result means?”

Plugin ecosystems need state machines, not just JSON

Agent platforms are outsourcing more runtime responsibilities to plugins: memory, context management, retrieval, model routing, tracing, policy checks, compaction, and tool catalogs. That is the right architecture if the platform wants to stay extensible. It is also dangerous if return values are loosely interpreted. A string reason like below_threshold is fine for logs. It is not enough as the only semantic boundary between “continue the user turn” and “drop the reply.”

The fix pattern should be a typed state machine. A compaction result might be compacted, noop_below_threshold, deferred, failed_retryable, failed_fatal, or unsupported. Each state should have required fields, token-budget expectations, and host behavior. No-op success should be tested as aggressively as successful compaction because no-op is a common healthy result. If a plugin says “nothing to do,” the runtime should not have to infer whether that is good news.

There is a cost-control lesson here too. Compaction bugs now interact with token budgets, context windows, and fallback paths. A false fatal can force users to disable the context engine entirely, which may restore replies but lose the benefits of better memory handling. A false success can be worse, allowing an over-budget turn to proceed and fail later. The runtime needs enough telemetry to distinguish both cases. “Compaction failed” is not a diagnosis; it is the start of one.

For practitioners using OpenClaw with lossless-claw or any external context engine, the operational advice is straightforward. If you see below_threshold surfaced as an error, disable the active context-engine slot or upgrade to a build with the current-main fix. Keep the plugin installed if you need it, but do not leave it in the reply-critical path on a build that misclassifies no-op success. More broadly, stage context-engine integrations the same way you would stage auth or gateway policy changes. They can break delivery even when the model and channel are fine.

For OpenClaw maintainers, this class deserves contract tests. Every context-engine adapter should prove behavior for successful compaction, successful no-op, deferred compaction, retryable failure, fatal failure, malformed response, and post-compaction budget recheck. The test names should read like runtime policy, not implementation trivia.

The editorial take is simple: agent compaction is no longer an implementation detail. It is a runtime contract. If the contract is fuzzy, your chat channel becomes the integration test — and the user is the one who gets paged by silence.

Sources: GitHub issue #87091, PR #87088, PR #86993, issue #87095

Compaction is now part of message delivery

Plugin ecosystems need state machines, not just JSON

Sign up for more like this.