openclaw

OpenClaw’s Agent-Aware Compaction Request Gets One Thing Right: Context Is Operational State, Not Garbage

Anatoliy Kolodkin

20 May 2026 • 4 min read

Context compaction is usually treated like garbage collection: necessary, invisible, and ideally boring. Issue #84571 argues for a better frame. In long-running agent work, context is operational state. It contains the failing test log, the user’s “do not touch auth” constraint, the last known-good diff, the current investigation path, and the half-formed conclusion that has not yet made it into durable memory. If the runtime silently shreds that working stack, the agent may continue fluently while losing the one fact that made the work coherent.

The request asks OpenClaw to make compaction visible and steerable from the agent side: warn before compaction, let the agent mark what should be preserved, expose a compaction_guide tool, and provide user policy knobs controlling whether agents may steer memory compression. That is more than a feature request. It is a product argument that memory management belongs in the agent control plane, not only in hidden runtime heuristics.

The proposal is opinionated in the right place

The issue was opened May 20 against OpenClaw 2026.5.19-beta.1 (ba9034b) and affects “all embedded agents.” It is labeled P2, impact:session-state, and needs-product-decision, which is exactly right. This is not a one-line bug. It changes what agents can know about their operating envelope and what authority they have over preserved context.

The proposed warning is refreshingly concrete: [compaction] imminent — session at ~72k/200k (36%), estimated N tool-result tokens before threshold. Compaction will run automatically if no action taken. The requested steering examples are also grounded: preserve “the last 4 tool results,” “my current reasoning chain,” or “the final findings summary only.” The proposed compaction_guide tool would include preserve, exempt, and reason fields, and could be enabled or disabled by policy.

That shape matters because the alternative is pretending a compactor can infer task salience from text alone. Sometimes it can. Often it cannot. The most important context in an investigation may be a boring line in a tool result, an invariant the user mentioned 30 turns ago, or a rejected approach whose value is precisely that the agent should not try it again. Generic summarization is good at making things shorter. It is not automatically good at preserving the operational stack of a live task.

Steering is useful. Unbounded self-preservation is not.

The obvious risk is letting agents mark everything as precious. A confused agent could preserve noisy logs, sensitive details, or irrelevant scratch work. A compromised agent could try to exempt context from compression for the wrong reasons. Even a well-intentioned agent may optimize for its current task while violating a user’s privacy preference. That is why the right design is advisory steering with policy boundaries, not agent-controlled memory preservation.

The system should let an agent propose what matters and explain why. The compactor should weigh that proposal alongside token pressure, recency, user policy, sensitivity rules, and durable-memory scope. Admins should be able to disable steering for untrusted agents, require user confirmation before preserving sensitive context, or allow steering only for specific paths such as tool results and task summaries. Every steering event should be logged. Silent does not mean unobservable.

This connects directly to OpenClaw’s recent release direction. The v2026.5.19-alpha.1 release invests in runtime parity tiers, live-only Codex-vs-Pi canaries, tool fixture coverage, approval-denial scenarios, no-fake-progress checks, and memory-promotion shadow trials. Agent-aware compaction fits that pattern if it is measurable. Track warnings emitted, steering proposals submitted, proposals accepted or rejected, token savings, post-compaction tool failure rates, and whether users correct the agent less often after compaction. Otherwise “collaborative compaction” risks becoming another invisible heuristic with better branding.

For builders, this is one of the most practical vibe-debugging ideas in the current OpenClaw queue. AI-generated code often fails not because the model cannot write the next function, but because it loses the architectural map. If compaction drops the failing stack trace, the constraint about database migrations, or the fact that a previous fix broke Windows, the agent may proceed with a plausible plan that is already invalid. A compaction_guide primitive could become a workflow tool: before a large refactor, tell the agent to preserve invariants, changed files, test failures, rejected approaches, and deployment constraints. That is much more useful than pretending a 200k-token window is infinite.

For operators, the immediate checklist is to test compaction as a failure boundary, not a background detail. Create a staging conversation with real tool output. Push it across threshold. Confirm what is preserved, what is summarized, and what the agent believes after compaction. If the agent is doing long investigations, ask whether it can produce a pre-compaction handoff manually today. If not, the automated compactor is probably making a lossy handoff without the worker’s help.

The proposal also intersects with session-key and delivery reliability bugs. Issue #84575 suggests concurrent requests can split memory scope under the OpenAI-compatible endpoint. Issue #84489 frames broader subagent/Codex orchestration as not consistently inspectable or capability-correct. Those are not separate from compaction. They are all state-continuity problems. An agent platform has to know which conversation a turn belongs to, which work is active, which results were delivered, and which context survives compression. Memory is not a feature bolted onto chat. It is the continuity layer that makes the runtime coherent.

The editorial conclusion is that compaction should stop pretending to be invisible. It is an operational transition that can change the outcome of a task. Let agents safely say what state matters before the shredder starts, let users set the rules, and log the decision. That is how context management graduates from token housekeeping to runtime governance.

Sources: OpenClaw issue #84571, OpenClaw v2026.5.19-alpha.1, issue #84575, issue #84489

The proposal is opinionated in the right place

Steering is useful. Unbounded self-preservation is not.

Sign up for more like this.