openclaw

OpenClaw’s Dreaming Pipeline Has a Memory Poisoning Problem, Not Just a Cleanup Problem

Anatoliy Kolodkin

15 Apr 2026 • 4 min read

OpenClaw's memory stack has crossed the line where “quality bug” is no longer an adequate label. If issue #67442 holds up, the problem is not that Dreaming occasionally writes goofy summaries. The problem is that the system can take transport noise, promote it through multiple memory stages, and finally stamp it into MEMORY.md as if it were durable truth. That is not messy UX. That is trust-boundary failure in the part of the product that is supposed to decide what matters enough to remember.

OpenClaw's own Dreaming documentation makes the promise pretty clearly. The diary is supposed to be for human-readable reflection, and only grounded snippets are supposed to be eligible for long-term promotion. The report says that promise is leaking. The examples are damning precisely because they are so banal: untrusted wrapper labels like “Conversation info,” sender metadata, raw message IDs, and staged-candidate scaffolding showing up where stable memory should live. A memory system does not have to be perfect to be useful, but it does have to know the difference between a fact about a person and a piece of transport plumbing.

The issue also matters because it does not appear in isolation. The reporter ties it to a chain of adjacent bugs, including earlier reports about raw session-corpus metadata surfacing in Dreaming reflections and daily memory pollution reaching memory/YYYY-MM-DD.md. That progression matters. A single noisy summary can be shrugged off. A repeated pattern across staging, daily memory, and deep-phase promotion suggests the pipeline is reinforcing contamination instead of filtering it out. Once a system starts rewarding its own intermediate artifacts, garbage stops being transient. It becomes history.

The real bug is memory laundering

What makes this more serious than a sloppy summarizer is the architecture of reinforcement. Dreaming is not just generating text for a UI panel. It is moving information through phases, rehydrating snippets from daily files, and appending promoted entries to long-term memory. That means a bad ingest decision can survive long enough to look legitimate later. The model saw it before, the pipeline scored it before, the system wrote it down before, so on the next pass it starts to resemble evidence. This is how metadata gets laundered into memory.

That is the same failure mode engineers worry about in distributed systems and data platforms: once bad data lands in the canonical store, every downstream consumer becomes harder to trust. In OpenClaw's case, those consumers are not just dashboards. They are retrieval, active memory, future summaries, and operator judgment. If MEMORY.md fills up with sender tags and message IDs, the blast radius is bigger than embarrassment. The assistant becomes more confident and less useful at the same time.

There is also a subtle product problem hiding here. Memory products are judged less by recall than by taste. Users will tolerate some forgetting. They do not tolerate being remembered incorrectly in a machine-generated, system-endorsed way. “You once mentioned project X” is helpful. “Sender metadata 5417 appears important” is the kind of thing that makes a smart assistant feel like a haunted log parser.

Why this happens in agent systems

Agent frameworks keep walking into this class of bug because the same transcript has to serve too many masters. One layer wants complete operational context. Another wants human-readable history. Another wants retrieval-friendly snippets. Another wants durable memory. If those layers share raw material but not strict normalization boundaries, the convenience of reuse turns into contamination. The system starts treating trace data, wrapper text, and user meaning as neighboring categories instead of separate ones.

OpenClaw in particular is ambitious here. It wants explainable memory, durable memory, diary views, and promotion logic that operators can inspect. That is the right ambition. But the more visible and multi-stage the pipeline becomes, the less room there is for fuzzy ingress rules. A journaling system can get away with some noise. A promotion system cannot. The moment a pipeline writes back into a long-lived memory file, every pre-filter, boundary label, and eligibility gate becomes part of the security model.

The phrase “memory poisoning” is sometimes overused, but it fits better here than softer labels like “cleanup” or “hallucinated summary.” Poisoning is what happens when untrusted or irrelevant material enters a privileged memory path and gains durability it did not earn. No attacker is required. A system can poison itself by confusing traceability with truth.

What operators should do now

If you run OpenClaw with Dreaming enabled, do not wait for a clean postmortem. Audit the entire path. Check DREAMS.md, daily memory files, and the latest promoted block in MEMORY.md side by side. Look specifically for wrapper terms, sender labels, raw IDs, or any text that feels like channel scaffolding rather than durable context. If it appears upstream, assume the promotion path is suspect too.

Second, treat long-term memory as code, not decoration. Review diffs before you trust them. Back up MEMORY.md, keep manual edits recoverable, and do not let active retrieval run uninspected against obviously polluted memory. The cost of extra review is much lower than the cost of letting bad memory become the substrate for future automation.

Third, if you are building your own agent memory layer, copy the right lesson. The answer is not “use better prompts.” The answer is stronger typed boundaries. Transport metadata should be structurally impossible to promote without explicit transformation. Promotion candidates should carry provenance. Deep-phase consolidation should prefer grounded snippets over raw transcript text. And if the system is uncertain whether something is person-level memory or protocol-level noise, it should discard it.

That last point is the one the whole market needs to learn. Memory is not a feature you sprinkle on top of an assistant. It is a storage system with editorial judgment attached. Once you accept that, the engineering priorities change fast. You worry less about magical recall and more about ingestion discipline, promotion criteria, and how to unwind mistakes.

OpenClaw is chasing the right problem. Useful assistants should remember more than the current turn. But memory systems do not become trustworthy because they are proactive. They become trustworthy because they are picky. If the project wants Dreaming to feel like durable context instead of durable residue, it needs much harsher rules about what gets to survive the trip from transcript to memory.

Sources: OpenClaw issue #67442, OpenClaw Dreaming docs, issue #63921, issue #67363

The real bug is memory laundering

Why this happens in agent systems

What operators should do now

Sign up for more like this.