openclaw

OpenClaw’s Dreaming Feature Accidentally Became a 34-Million-Token Background Job

Anatoliy Kolodkin

18 Apr 2026 • 4 min read

The funniest OpenClaw bug of the week is also one of the most expensive. A feature called Dreaming, meant to turn messy chat history into something like durable memory, apparently spent days spawning side sessions that wrote poetic diary entries, saved nothing useful, and quietly burned through 34,151,809 tokens across 723 runs. That is a bug report with the energy of satire, but it is really a story about a category-wide problem: agent platforms are starting to operate fleets of helper agents behind the scenes, and too many of them still treat scheduler discipline like optional polish.

According to OpenClaw issue #68530, the platform’s built-in dreaming pipeline was launching dreaming-narrative-light and dreaming-narrative-rem sessions on every heartbeat poll. Each run consumed roughly 46,000 to 53,000 tokens, averaging about 47,236 tokens per session. The prompt was not subtle: “Write a dream diary entry from these memory fragments.” The output was also not especially operational. The reporter says the system produced “poetic diary entries” that showed up in the Control UI chat and were never persisted to a file, memory store, or other durable surface that would justify the spend.

The fast follow matters almost as much as the bug. A linked fix in PR #68534 introduces a file-backed cooldown store at memory/.dreams/phase-cooldowns.json so these narrative phases run on an actual cadence instead of every time deep dreaming gets triggered. The implementation estimates intervals from cron expressions, then applies an 80 percent safety factor, landing around 4.8 hours for light dreaming and roughly 5.6 days for REM under the defaults described in the brief. That is not glamorous engineering. It is the kind of boring guardrail that separates “interesting AI behavior” from “background bill generator.”

The important point is not that OpenClaw accidentally wrote bad poetry at scale. The important point is that the platform has crossed a threshold where internal helper agents are no longer just implementation details. They are workloads. Once you have sub-agents for memory analysis, summarization, retrieval, evaluation, or diary-like reflection, you have built a control plane, whether you wanted to call it that or not. And control planes do not get to rely on vibes.

This is a good example of why the current agent market often feels more mature in demos than in production. Everybody likes the high-level pitch: the assistant remembers things, reflects on prior work, and gets smarter over time. But those features are not magic. Under the hood they are scheduled jobs with prompts, model calls, budgets, storage rules, and failure modes. If the prompt says “generate a narrative” but the scheduler forgets to ask whether it should run now, or where the output should go, you do not have intelligence. You have automation without governance.

There is also a subtler design issue here. Dreaming is defensible when it improves retrieval quality, condenses recurring themes, or surfaces context the main agent would otherwise miss. It becomes a lot harder to defend when the expensive part of the workflow is the narrative flourish rather than the memory operation itself. That is the trap several agent products are drifting toward right now. They mistake visible “personality” work for durable system value. A memory subsystem should first prove that it can reduce noise, cut repeated context costs, and make future turns better. If it wants to write a diary on top of that, fine, but only after the infrastructure basics are already nailed down.

For practitioners, the lesson is painfully portable. Audit every background lane in your stack, especially anything triggered by heartbeat polls, cron fan-out, or maintenance cycles. Ask four blunt questions. What is the maximum run frequency? Where is output persisted? What budget stops this from scaling into a surprise invoice? What visibility do operators get before usage becomes ridiculous? If you cannot answer those questions for a helper agent, it is not a helper. It is technical debt with model access.

OpenClaw’s own incident numbers make the point better than any abstract principle. Seven hundred twenty-three sessions is not an edge-case blip. Thirty-four million tokens is not rounding error. And the most damning line in the report is not the token count. It is that the practical output was “none.” Engineers will tolerate expensive systems when the cost buys something durable. They get much less forgiving when the money disappears into a side path that never feeds the product back.

The community response inside the repo also tells you this is more than a meme bug. Reviewers immediately moved beyond the joke and started pushing on cooldown math, zombie-session detection, and token-budget circuit breakers. That is the right instinct. Once platforms orchestrate their own helper agents, they need the same safety rails we already expect for queues, workers, and scheduled jobs. Cooldowns. Budgets. Persistence rules. Health checks. Backpressure. The fact that the workers happen to be language models does not exempt them from operations.

My take is simple. This bug is not mainly about Dreaming. It is about the false comfort of anthropomorphic language in system design. Call it REM, call it narrative, call it reflection, whatever you want. If it wakes up on every heartbeat, consumes tens of thousands of tokens, and leaves nothing useful behind, it is not dreaming. It is thrashing. And the teams that figure that out early will build the agent platforms people actually trust to keep running in the background.

Sources: OpenClaw issue #68530, OpenClaw PR #68534, OpenClaw v2026.4.15 release notes

Sign up for more like this.