openclaw

OpenClaw's Boot-Time Embedding Leak Is a Nice Reminder That 'Multi-Agent' Often Means 'Multiply the Same Expensive Mistake'

Anatoliy Kolodkin

29 Apr 2026 • 4 min read

There's a specific kind of bug that only shows up when you're running the math on resource utilization instead of watching the UI. Issue #72144 is that kind of bug. The reporter — running OpenClaw v2026.4.11 through v2026.4.23 on WSL2 Linux x64 with no GPU and six configured agents — measured gateway RSS at 2,711 MB and CPU at 103% four minutes after boot. The culprit: six parallel embedding jobs, each loading its own 314 MB GGUF model copy into memory before checking whether any fresh embeddings were actually needed.

The issue is specific enough to be credible and widespread enough to matter. Anyone running multiple agents with local embedding models on a budget VPS, a home server, or a Raspberry Pi has probably noticed boot time getting slower as the agent count climbs. The conventional explanation is "more agents = more startup work." The actual explanation is that the boot path has a cache-miss pattern dressed up as a feature.

The mechanism

When OpenClaw boots, the qmd embed process kicks off embedding jobs for all configured agents in parallel. Each agent loads its own copy of whatever GGUF embedding model is configured — in this case, hf_ggml-org_embeddinggemma-300M-Q8_0.gguf, a 314 MB file. Six agents means roughly 1.88 GB of model memory before the gateway itself has finished loading. That's the model's weight memory alone. Add the runtime overhead, the gateway process, and the extra RSS that accumulates during the backoff-and-retry cycle, and the 2.7 GB measurement starts making sense.

The particularly galling part: a manual standalone qmd embed run completes in 0.66 seconds and reports that all hashes already have embeddings. The boot path loads the model before checking whether any work exists. It optimistically materializes a heavy asset because the code path that decides "do we need fresh embeddings?" runs after the model is already in memory. In a single-agent setup, that's a rounding error. In a six-agent setup, it's a memory pressure event with a 15-minute retry loop attached.

When the embed job times out — and it does, at 120 seconds per attempt — the gateway backs off for 60 seconds and retries. Each retry cycle adds more RSS as the runtime holds onto allocations from the previous attempt. The reporter measured the climb from roughly 1.3 GB to 2.7 GB over about 15 minutes, which means the backoff loop is not just wasting time. It's a slow memory leak driven by model reloads that should never have been triggered in the first place.

There's also a disk waste angle. Each agent maintains its own qmd cache, which means six copies of the same 314 MB GGUF file sitting on disk across agent-local directories. For a hobbyist running OpenClaw on a 256 GB laptop SSD, that's annoyance. For an enterprise fleet running dozens of instances on storage-constrained cloud VMs, it's a line item that compounds fast.

Why this keeps happening

This is the second article in two days about a resource-scheduling bug in OpenClaw's multi-agent layer, and that's not a coincidence. The platform's multi-agent design has been converging on "more agents, more capability" as the mental model, but the infrastructure underneath has been catching up to that reality in fits and starts. When you add an agent, the system has to decide how to allocate shared resources — embedding models, context windows, tool registries, session stores. Those decisions keep getting made correctly at runtime and incorrectly at boot, because startup paths are often older code with less rigorous resource-awareness than the main loop.

The embedding model duplication at boot is a good example. A shared embedding model service — one copy, lazy-loaded, reference-counted across agents — would solve the memory problem and the disk waste problem simultaneously. OpenClaw's architecture does support plugin-based and MCP-based embedding backends, which suggests the hooks exist for alternative providers. The issue is that the boot-time path doesn't attempt reuse before materializing per-agent copies. That's an implementation gap, not a design impossibility.

The other systemic pattern worth noticing: timeout and backoff behavior that makes the problem worse before it makes it better. When the embed job times out, the gateway backs off and retries. That's reasonable behavior when the failure is transient. It's a compounding disaster when the failure is "the model is already loaded and all embeddings are current" — because the retry will reload the model again, add more RSS, and potentially hit the same timeout. The gateway is optimistically doing expensive work it doesn't need to do, then handling the symptom of that unnecessary work by scheduling more unnecessary work.

What practitioners should do now

The workaround is straightforward: disable memory.qmd.update.onBoot if you see similar symptoms — high RSS and CPU at startup with no actual embedding work being produced. That stops the parallel embed jobs from firing on every boot. It's not a fix, but it's a fix for the symptom while the architecture catches up.

More broadly, this is a reminder that local embedding models in a multi-agent setup are not free infrastructure. They're workloads with memory footprints, load times, and duplication costs. If you're running more than two or three agents on a memory-constrained host, benchmark your actual boot-time RSS before assuming the system is behaving correctly. The gap between "agent count" and "memory footprint" is not linear, and it's not always obvious from logs.

The architectural fix — shared, lazy-loaded embedding model service with reference counting — is the right direction, and it's worth watching whether this issue pushes that work forward. In the meantime, treat your embedding model like any other stateful service: one copy, loaded on demand, shared across consumers. The alternative is what the reporter described: a system that looks functional while quietly consuming three times the memory it needs.

Sources: GitHub issue #72144, v2026.4.23 release notes, HuggingFace GGML embeddinggemma-300M

The mechanism

Why this keeps happening

What practitioners should do now

Sign up for more like this.