OpenClaw Dreaming Has a Context-Bloat Problem, and the Numbers Are Not Subtle
OpenClaw’s Dreaming feature has the right product instinct and the wrong failure mode. Persistent agent memory is useful only if it behaves like a retrieval system. Issue #87095 shows what happens when it behaves like an append-only prompt cannon: a long-lived workspace can accumulate enough remembered context that a brand-new session starts over budget before the user has typed a single useful thing.
The numbers in the report are not subtle. The reporter measured memory/.dreams/short-term-recall.json at 4,655,823 bytes, roughly 1.16 million tokens by their estimate. Nearby Dreaming files added more weight: events.jsonl at 529,634 bytes, phase-signals.json at 435,499 bytes, and session-ingestion.json at 424,432 bytes. Combined, the workspace was injecting something like 1.5 million tokens of Dreaming-related context before the active conversation even began. The observed symptom was grimly clean: 100% context used — 1.4M / 1M at /new.
That is not memory. That is self-inflicted context exhaustion with a nicer name.
A memory system needs a budget before it needs a personality
Agent memory features tend to get marketed as continuity: the agent remembers what happened yesterday, carries preferences across sessions, and develops a stable working context. Fine. Builders want that. But continuity without retention policy is just slow-motion denial of service against the model context window. Every saved reflection, event, phase signal, and session-ingestion record competes with the next user’s prompt, tool schema, system instructions, and active working set.
The issue describes a plausible compounding loop. Compacted sessions feed session-ingestion.json. Those entries get promoted into short-term-recall.json. New sessions reload that recall. Without eviction, summarization and promotion make the next session heavier. Eventually the agent is not remembering better; it is arriving at every new turn already buried under its own notes.
This is the same mistake teams make with logs, caches, metrics labels, and vector stores, but agents make the blast radius more visible. A log file that grows forever fills disk. A cache that grows forever eats memory. A recall file that grows forever fills the prompt. The failure shows up as latency, cost, degraded reasoning, or outright context overflow. The model looks flaky, but the storage layer is the one quietly doing damage.
ClawSweeper’s source review adds the useful nuance. Current main apparently already has a capped /new startup-memory path and a Dreaming disable switch, so not every current branch necessarily behaves exactly like the reporter’s released build. But the review still confirms the important unresolved path: session-corpus batches can be recorded into short-term-recall.json, and the store writer lacks an entry, byte, or token retention cap. In other words, some injection paths may now be safer, but the underlying reservoir can still grow without a real storage contract.
Retrieval should retrieve, not shovel
The requested fixes are the right shape: hard cap short-term-recall.json around a bounded token budget, enforce per-session Dreaming injection budgets, use semantic top-N recall instead of wholesale injection, evict stale session-ingestion entries, add openclaw doctor warnings above thresholds, and expose a config flag to disable Dreaming injection. None of that is glamorous. All of it is what turns a memory feature from a demo into infrastructure.
The most important design point is semantic selection. If a user asks about an OpenClaw release, the agent does not need every remembered session, every phase signal, and every event record. It needs the few memories relevant to the task, with provenance and a visible token cost. Retrieval should be accountable: why this memory, why now, how many tokens, and what got left out because of budget. Without that, “memory” becomes a blob pasted into the system’s mouth.
There is also a governance angle. Agent operators need to inspect memory pressure the way they inspect disk, queue depth, and error rates. A Dreaming dashboard should show total stored bytes, estimated tokens, injection budget used, selected recall items, and capped/skipped items. If .dreams/ has grown to several megabytes, the system should not wait for the model to explode. It should warn, rotate, compact, or quarantine.
For practitioners running OpenClaw, the immediate checklist is short. Inspect memory/.dreams/ in long-lived workspaces. If new sessions feel slow, expensive, or strangely over-context before real work starts, check short-term-recall.json first. If it is huge, quarantine it before deleting it: mv ~/.openclaw/workspace/memory/.dreams ~/.openclaw/workspace/memory/.dreams.QUARANTINE. The reporter says OpenClaw recreated an empty .dreams/ directory on the next session and normal context size returned.
The bigger lesson is not OpenClaw-specific. Every agent platform trying to sell “long-term memory” needs to answer boring questions before it earns trust: What is retained? For how long? Under what byte and token caps? What is retrieved by default? Can the operator see why? Can the user purge it safely? Does compaction reduce state, or merely produce another object that gets injected later?
Agent memory is not magic. It is a database with a prompt-shaped output channel. If you do not enforce retention, the database eventually becomes the prompt. At that point the agent is not more contextual. It is just buried alive in its own continuity.
Sources: GitHub issue #87095, PR #86302, PR #87088, issue #62420