openclaw

OpenClaw Is Finally Treating models.json Like Prompt-Visible Attack Surface

Anatoliy Kolodkin

18 May 2026 • 4 min read

There are security bugs that announce themselves with a CVE and a diagram. Then there are the quieter ones: a config file becomes prompt-visible, a generated catalog preserves a plaintext API key, and suddenly the language model is sitting inside your credential boundary because everyone agreed to call it “metadata.” OpenClaw PR #83592 is the unglamorous kind of fix agent platforms need more of. It treats models.json as what it effectively is in an agent runtime: part configuration, part planning surface, and therefore part attack surface.

The PR expands OpenClaw’s provider-secret hardening so newly generated plaintext provider apiKey values and sensitive provider headers do not get written into prompt-visible models.json. It is explicitly scoped as “Layer 1” of the broader #11829 security roadmap, which is the right amount of humility. Removing credentials from a catalog is not a complete secrets architecture. It is the minimum viable admission that catalogs can leak.

The concrete patch is small enough to understand and important enough not to hand-wave. It changes five files, with 357 additions and 5 deletions. It strips newly generated plaintext provider apiKey values from models.json. It avoids copying plaintext auth-profile keys into generated provider config, including rows preserved through merges. It also strips sensitive headers such as Authorization and x-api-key, while preserving non-secret markers and legacy unchanged rows. That last phrase matters: migrations that break provider identity metadata tend to create workarounds, and workarounds are where secrets go to die.

Secret references are the right shape, not a magic spell

The PR’s proof output shows the intended behavior clearly. A custom provider key is not persisted: customApiKeyPersisted: false. A custom header object keeps X-Tenant: tenant-a, which is routing context rather than a credential. OpenAI’s API key remains a marker, OPENAI_API_KEY. An OpenAI authorization header remains a structured reference, secretref-env:OPENAI_PROXY_TOKEN. That is the useful distinction: the model and planner may need to know that a provider exists and that some auth material can be resolved at runtime. They do not need the auth material itself.

This is where a lot of agent-platform security discussion goes soft. People say “redact secrets” as if redaction is a product feature rather than an emergency brake. Redaction after a secret has already crossed into prompt context is not a boundary. It is a best-effort cleanup operation on a transcript-shaped spill. A secret reference is better because it changes the interface. The prompt-visible layer can carry intent — provider identity, route name, capability metadata, maybe an environment-variable marker — while the runtime resolves the credential outside the LLM’s working memory.

That distinction is also why the broader #11829 discussion is worth reading. One user makes the obvious but often skipped point: config.get redaction is cosmetic if the agent can simply grep ~/.openclaw/openclaw.json. Another compares the right model to remote signing: the agent should get scoped authority to perform an operation, not possession of the secret material that authorizes it. That is the architecture agent systems are slowly backing into. Credentials should live at the tool, network, or broker boundary. The model should request an action. Something else should decide whether the action is allowed and attach the secret if appropriate.

The model catalog was never “just config”

The reason this PR matters beyond its diff is that models.json sits in a weird trust position. It is generated infrastructure, but it informs planning. It is local state, but parts of it can become visible to the agent. It contains provider names, model capabilities, routing metadata, auth hints, and sometimes merge-preserved custom entries. In old software, that might be treated as harmless configuration. In an agent system, anything prompt-visible can be repeated, summarized, logged, copied into memory, included in a bug report, pasted into a Slack answer, or used by the agent to choose a tool path. Configuration is not inert when the runtime reads it aloud to a model.

For practitioners, the action item is broader than “wait for this PR to merge.” Audit every place your agent stack serializes runtime configuration into prompts, transcripts, debug context, memory files, or tool descriptors. Provider catalogs are one. Tool schemas are another. MCP server descriptions, plugin manifests, auth-profile metadata, browser-state hints, channel adapter config, and “helpful” diagnostics can all become accidental disclosure surfaces. If a field contains a bearer token, API key, session cookie, OAuth refresh token, webhook secret, signed URL, private endpoint, or tenant-specific header, assume a sufficiently capable agent will eventually find a way to quote it back to the wrong place.

Teams should also separate three ideas that often get mashed together: discovery, authorization, and credential material. Discovery answers “what capabilities exist?” Authorization answers “is this actor allowed to use the capability for this request?” Credential material is the token that lets the operation happen. LLMs can help with discovery and intent. They should have much less direct contact with authorization state, and ideally no direct contact with credential material. If your design requires the model to see an API key so it can decide whether to call an API, the design is upside down.

The validation details are encouraging because they focus on behavior, not vibes. The PR cites gateway shared-auth tests, agent model-config tests, oxlint over planner/normalizer/secret-helper files, and git diff --check. That is not a complete security proof, but it is the right kind of engineering posture for this layer: normalize the data, strip the sensitive fields, preserve the non-sensitive markers, and test that the catalog output has the shape you claim it has.

The limitation is the point. Layer 1 does not stop an agent with filesystem access from reading .env, project configs, shell history, or the main OpenClaw config if those paths are reachable. It does not solve OAuth tokens acquired during tool use. It does not give every skill its own least-privilege credential boundary. It does not prevent a malicious plugin from exposing secrets through its own descriptor. But a platform that cannot keep provider keys out of its own model catalog has no chance at the harder layers. Table stakes still matter because you cannot build the rest of the table in midair.

My take: this is exactly the kind of boring patch that deserves more attention than a flashy model integration. Agent platforms are full of files that began life as configuration and became prompt-adjacent once models started planning over everything. If a model can see a key, your redaction story has already failed. OpenClaw is patching one catalog leak. The next serious audit question is how many other catalogs are secret stores with nicer formatting.

Sources: OpenClaw PR #83592, OpenClaw issue #11829, OpenClaw PR #83545, OpenClaw v2026.5.16-beta.7

Secret references are the right shape, not a magic spell

The model catalog was never “just config”

Sign up for more like this.