openclaw

OpenClaw’s Compaction Circuit Breaker Is the Right Kind of Cost Control

Anatoliy Kolodkin

26 May 2026 • 3 min read

Retry logic is where optimism goes to become a cloud bill. OpenClaw PR #86900 is small, but it fixes the right class of problem: compaction should stop hammering a summarizer once the runtime has enough evidence that the dependency is down. That is cost control in the execution path, not a sad dashboard after the tokens are already gone.

The old behavior was mechanically expensive. OpenClaw’s summarizeChunks() path used retryAsync() with three attempts per chunk. If the summarizer model was overloaded, rate-limited, misconfigured, or otherwise unavailable, each chunk could fail three times before the loop moved to the next chunk. The PR’s concrete example is blunt: five chunks times three retries times roughly 4K tokens equals about 60K tokens wasted with no generated summary.

The patch adds COMPACTION_CIRCUIT_BREAKER_THRESHOLD = 2, tracks consecutive chunk failures, resets the counter after a successful chunk, and throws a CompactionCircuitBreakerError with partial summary state attached after two failed chunks. The proof script shows the before-and-after behavior: without the breaker, five chunks produce 15 failed API calls; with it, the loop stops after chunk two, around six calls, saving roughly 36K tokens while returning a previous or partial summary.

Retries are tactics. Circuit breakers are policy.

The distinction matters. A retry says, “maybe this failed transiently.” A circuit breaker says, “we have enough evidence that the dependency is unhealthy, so continuing is actively harmful.” Agent runtimes need both, but they often ship only the first because local retries are easy to write and global failure policy is harder to think through.

Compaction is a particularly bad place to be naive. Long-lived agent sessions depend on summarization to stay within model context limits, preserve continuity, and avoid dragging full transcripts into every turn. When compaction fails, the system is already in a degraded state. Continuing to burn tokens chunk by chunk can make the state worse: more failed requests, more rate pressure, more latency, and no better summary at the end.

Issue #58838, the underlying report, was already labeled as source-reproducible and queueable. PR #86900 changes two files, adding 192 lines across src/agents/compaction.ts and src/agents/compaction.circuit-breaker.test.ts. The author lists targeted verification with OPENCLAW_VITEST_MAX_WORKERS=1 npx vitest run src/agents/compaction.circuit-breaker.test.ts, and the tests cover all-fail, partial-fail, and success-after-failure paths.

The untested part is worth stating plainly: deterministic unit tests do not prove behavior against a live summarizer model that flips between available and unavailable. That does not make the PR weak. It makes the proof honest. The unit tests prove policy shape; production still needs telemetry showing when the breaker opens, what summary state the runtime used, and whether users notice stale context.

Partial memory is better than a token bonfire, but it must be visible

The product tradeoff is subtle. Falling back to a previous or partial summary keeps the agent alive, but it can also preserve stale context. That is usually the right failure mode, provided the runtime surfaces it. The patch logs compaction circuit breaker triggered with completed and total chunk counts. Good start. The UI and diagnostics should eventually say something human-readable: “Compaction degraded; using summary through chunk 2 of 5.”

Silent partial context is better than silent token burn, but not by enough to stop there. Operators need to know when the agent is proceeding with degraded memory. Developers debugging downstream weirdness need to know whether the model saw a fresh summary, a partial summary, or an older checkpoint. If a coding agent starts making decisions from stale project state, the cost problem has turned into a correctness problem.

For practitioners, the immediate lesson is broader than OpenClaw. Any agent platform with long-lived sessions should have cost-aware failure modes around summarization, memory refresh, vector indexing, tool cataloging, trace export, and background media generation. The anti-pattern is local retry loops with no global view of the failing dependency. If each chunk, file, tool, or trace batch retries independently, the runtime can stampede a broken service while pretending each small retry is reasonable.

Good governance is not just monthly spend charts. It is runtime behavior that prevents waste before accounting notices it. Set per-turn budgets. Clamp output tokens. Put deadlines on tool discovery. Add circuit breakers around dependencies that can fail systematically. Expose degraded-state decisions in the transcript or diagnostics. Give operators a way to distinguish “the model forgot” from “the compactor reused an old summary because the summarizer was down.”

This PR is not a grand architecture rewrite. That is part of why it is useful. It applies a mature systems pattern at the exact point where waste happens. The runtime sees consecutive failures, stops the doomed loop, preserves partial state, and moves on with a trace. That is the kind of boring engineering agent systems need more of.

The take is simple: retries should buy resilience, not denial. Once the summarizer has made its point twice in a row, OpenClaw should stop asking the same expensive question and start degrading deliberately.

Sources: GitHub PR #86900, issue #58838, OpenClaw v2026.5.22 release, issue #86592

Retries are tactics. Circuit breakers are policy.

Partial memory is better than a token bonfire, but it must be visible

Sign up for more like this.