openclaw

OpenClaw’s Cache-Warmer Proposal Says the Quiet Part Out Loud: Prompt Caching Is Becoming Part of Agent Runtime Design

Anatoliy Kolodkin

22 Apr 2026 • 4 min read

Prompt caching is having the same career arc connection pooling had years ago. It starts life as a performance optimization people mention in docs and ends up becoming part of runtime design. OpenClaw’s cache-warmer proposal is interesting for exactly that reason. Issue #70418 is nominally about Anthropic prompt-cache TTLs, but the deeper story is that long-lived agent systems are beginning to treat cache physics as infrastructure instead of as an incidental API behavior.

The proposal is straightforward. For models using cacheRetention: "long", Anthropic’s prompt cache expires after one hour of non-access. That means an agent with a large stable prefix, system instructions, tool declarations, memory scaffolding, and accumulated context can sit idle long enough to lose the cache, then force the next user turn to pay a full prefix rewrite. The issue argues that OpenClaw should add a dedicated background subsystem to refresh cache TTLs during idle periods without piggybacking on heartbeat logic.

That distinction matters more than it sounds. Heartbeats in OpenClaw are not just timers. They carry session behavior, read HEARTBEAT.md, can produce user-visible delivery behavior like HEARTBEAT_OK, and may diverge from the actual conversation prefix because they use different prompt and tool-set paths. Even if those divergences are fixed, heartbeats remain a noisy abstraction for a pure cache-keepalive job. The proposal’s best instinct is separation of concerns: a cache warmer should exist to manage cache TTL, full stop.

The cost math is not the whole story, but it is enough to make this serious

The issue includes rough economics for a 100K-token cached prefix. A refresh would cost about $0.05 in cache reads, plus negligible write and output tokens, translating to roughly $1.30/day/agent at a 50-minute interval. The counterfactual is also explicit: if that cache expires twice per day and the next user turn has to rewrite the prefix at one-hour cache-write rates, that can cost about $2/day/agent, leaving a net saving around $0.70/day/agent. Whether those exact numbers hold in every configuration is less important than the structure of the argument. Cache state now has an operating cost profile of its own.

That is the quiet shift. Once you are doing cost comparisons between refresh traffic and rewrite traffic, caching has stopped being a hidden transport optimization and started becoming a workload-management concern. This is especially true for agent systems, where stable prefixes are huge and the user experience penalty of a cold cache is not just money. It is latency and interaction feel. A long-lived coding or memory-heavy agent that suddenly pays to rebuild its entire prefix after an idle hour is materially different from one that stays warm.

The proposal also lands at an interesting moment because it cross-links directly to the companion cost-accounting issue. Put the two together and you get a very modern agent-ops problem: the platform wants to optimize cache behavior, but its internal cost telemetry may not yet be accurate enough to tell operators whether the optimization is paying off. That is not a reason to avoid the feature. It is a reason to recognize how intertwined runtime engineering and FinOps have already become.

What should developers actually do with this?

If you build long-lived agent workflows, the first thing to do is stop treating prompt caching as a lucky side effect. Measure it. Look at prefix size, idle periods, cache retention settings, cold-turn latency, and model-specific cache pricing. If your workflow routinely carries a giant stable prefix, especially for coding agents, multi-agent coordinators, or memory-heavy assistants, then cache TTL is part of your runtime behavior whether you formalize it or not.

Second, keep the separation-of-concerns lesson. A cache warmer should not pretend to be a user-visible turn, and a user-visible heartbeat should not become a billing optimization hack by accident. Systems stay comprehensible when each background task has one reason to exist. If OpenClaw adds this feature, the best version will be one that is explicitly opt-in, skips work during recent active use, and tells operators exactly what it refreshed and why.

Third, remember that any new background subsystem adds risk. A warmer creates more traffic, more model coupling, more scheduling logic, and another place where runtime state can drift from the main conversation path if implemented carelessly. That is manageable, but only if maintainers resist the temptation to bury it under vague “performance improvements” language. This should be exposed as a runtime policy choice with measurable economics, not magic behind the curtain.

There is a broader industry point here too. Agent runtimes are starting to look more like classical systems software. They already manage sessions, queues, schedules, memory, tool permissions, retries, and egress rules. Now they are learning to manage cache lifecycles the same way web platforms learned to manage keepalive connections, connection pools, and query caches. That is a sign of maturity. It is also a sign that the agent layer is moving away from “just call the model again” as an acceptable default for everything.

My take is that OpenClaw should seriously consider this feature, but only as a clearly separated subsystem with honest observability. The proposal is right that heartbeats are the wrong abstraction, and right that idle-period economics now matter for user-facing agent systems. The part to watch is whether maintainers pair any warmer design with correct cost reporting and good operator controls. A cache optimization nobody can reason about is just another invisible daemon with opinions.

Still, the headline stands: this proposal says the quiet part out loud. Prompt caching is becoming part of agent runtime design. Once that idea lands, the conversation changes from “should we cache?” to “what pieces of agent state deserve their own operational machinery?” That is a healthier, more serious question, and OpenClaw is asking it earlier than many projects will.

Sources: OpenClaw issue #70418, Anthropic prompt caching docs, Anthropic pricing docs, OpenClaw issue #70416.

The cost math is not the whole story, but it is enough to make this serious

What should developers actually do with this?

Sign up for more like this.