OpenClaw’s Boot-Time Embedding Leak Is a Nice Reminder That ‘Multi-Agent’ Often Means ‘Multiply the Same Expensive Mistake’

One of the quickest ways to expose whether an “agent platform” is real infrastructure or just a pile of clever demos is to look at what happens during boot. OpenClaw’s memory-leak report from April 26 is a neat stress test. Issue #72144 says the gateway launches six parallel embedding jobs at startup, one per configured agent, and each of those jobs loads its own 314 MB GGUF embedding model on CPU even when there are no new embeddings to compute. In other words, the platform is doing the expensive part first and checking whether the work is necessary second. That is not a model problem. That is a systems-design problem.

The report covers OpenClaw versions from 2026.4.11 through 2026.4.23 on WSL2 Linux x64, with no GPU and six agents configured to use hf_ggml-org_embeddinggemma-300M-Q8_0.gguf. The headline numbers are not subtle. Each agent loads a separate 314 MB copy of the model, so the boot path burns about 1.88 GB of model memory before accounting for normal gateway overhead. The reporter measured overall RSS at 2,711 MB with CPU at roughly 103 percent four minutes after boot. Then things get worse: the embed job times out after 120,000 ms, backs off for 60 seconds, retries, and lets memory climb from about 1.3 GB to 2.7 GB over roughly 15 minutes.

The most damning detail is also the simplest one. Manual standalone qmd embed reportedly finishes in 0.66 seconds and says all hashes already have embeddings. That implies the boot path is loading the full model before checking whether any actual work exists. If that is accurate, the platform is paying startup cost as though every boot were a cold re-indexing event, even when the underlying state is already current. Anyone who has spent time around build systems, caches, or database migrations will recognize the smell immediately: this is missing preflight logic combined with duplicated heavy initialization.

The duplicate disk cost is the smaller but still telling insult. The issue says each agent keeps its own copy of the same 314 MB model in an agent-local qmd cache. That means you are not just wasting RAM on startup. You are also multiplying the same artifact across agents on disk. Agent platforms love to talk about delegation, specialization, and autonomy. Fine. But shared infrastructure has to stay shared when the workload is identical. Six copies of the same CPU embedding model is not multi-agent magic. It is a cache-miss pattern with a keynote-friendly label.

There is a broader point here about how the industry uses the phrase “multi-agent.” The optimistic version means you can parallelize distinct work across specialized contexts. The sloppy version means you take one heavy background task and accidentally run it N times because there are N agents in config. OpenClaw’s boot-time embedding behavior, if the report holds up, lands squarely in the second bucket. That matters because it tells you where platform maturity still lags product ambition. Shared embeddings, shared models, bounded concurrency, and fast no-op checks are not optional refinements once people start running these systems on home labs, small VPSs, and CPU-only boxes.

The practical operator lesson is immediate. If you self-host OpenClaw and notice slow starts, memory spikes, or suspiciously chatty qmd activity on boot, do not assume it is harmless housekeeping. Benchmark it. Measure RSS before and after startup. Inspect whether memory.qmd.update.onBoot is worth its cost in your environment. If your embeddings are already current, the right startup workload is often “confirm and move on,” not “load a large model six times just in case.” Small hosts punish waste quickly, and this particular waste pattern is very legible once you look for it.

The maintainers should also treat this as a design prompt, not just a bug to squash. There are at least three architectural fixes suggested by the issue. First, add a dirt-cheap preflight check before any heavy model load so no-op boots stay no-op. Second, centralize embedding execution around a shared model cache or service boundary rather than per-agent isolated loads when the configured model is identical. Third, put stricter concurrency controls around boot-time maintenance work. Parallelism is only a win when the system has spare memory, spare CPU, and genuinely independent work to do. On a CPU-only machine, six simultaneous GGUF loads are not parallelism. They are self-harm.

This bug also speaks to a category-wide blind spot. Agent frameworks increasingly ship background memory pipelines, nightly consolidation jobs, vector indexing, and cross-session retrieval because those features sound useful and, honestly, often are useful. But every background pipeline has a resource story. When the platform treats background maintenance as free, users end up discovering the real cost through swap thrash and startup timeouts. That is backwards. Resource economics should be part of feature design, not a postmortem.

There is another subtle reason this matters. Boot-time inefficiency erodes trust differently from interactive slowness. If a user asks an agent a hard question and waits a bit longer, that feels understandable. If the platform burns gigabytes of RAM before the user asked for anything at all, it feels wasteful. Waste is much harder to forgive than latency. It suggests the system is not paying attention to what matters.

My read is that this issue is more important than many louder OpenClaw stories this week. Not because a boot leak is more glamorous than a new release or a routing fix, but because resource discipline is the part of agent infrastructure that turns experimentation into habit. Builders will tolerate a rough edge or two. They will not keep running software that treats memory like an infinite staging area. If OpenClaw wants to be the control plane for long-lived agents, it needs to get better at distinguishing real work from repeated expensive ceremony.

Sources: OpenClaw issue #72144, OpenClaw v2026.4.23 release, issue #72165, EmbeddingGemma 300M GGUF model page