OpenClaw’s Image/PDF Regression Shows Why Model Catalogs Are Runtime Infrastructure
The useful thing about OpenClaw’s image/PDF regression is that it failed in a way every agent platform will eventually fail: the model was capable, the provider was authenticated, the catalog knew the capability existed, and the runtime still said “unknown model.” That is not a model problem. It is a source-of-truth problem.
Issue #92104, opened on June 11, reported that OpenClaw’s image and PDF tool failed with errors like Unknown model and Model does not support images for Anthropic, Google, and OpenAI models that worked perfectly as agent models. The reporter was not guessing. They provided the versions — 2026.5.28 (e932160) and 2026.6.5 (5181e4f) — plus Linux x64, Node 22.22.2, a systemd user-service Gateway, model-list output showing text+image capability and auth enabled, and five configuration permutations that still failed. That is the kind of bug report maintainers should frame and put above the coffee machine.
PR #92176 fixes the resolver path by falling back to bundled catalog image capability metadata when a model config omits input, while preserving explicit text-only overrides. The root cause was small but revealing: resolveProviderModelInput defaulted omitted input to ['text']. Inline model matches were then preferred over the bundled catalog, so the catalog-declared image capability got shadowed. In other words, an absence of config was interpreted as an explicit denial of capability.
Model catalogs are not brochures anymore
For years, model catalogs were mostly user-facing convenience: names, context windows, prices, maybe a short description. Agent platforms have changed that. A catalog now decides whether a tool path can run, whether a fallback is legal, whether a provider can receive an image block, whether a PDF extraction route should exist, and whether a configured model is policy-compatible with a task. Catalog metadata has become operational state.
That is why this bug matters beyond image parsing. OpenClaw’s normal chat path and openclaw models list could agree that a model supported images, while the media-understanding tool disagreed. To the user, the system looks arbitrary: “Gemini can see images in one place, but the PDF tool says Gemini is unknown.” To the operator, the diagnosis is sharper: different runtime paths are consulting different layers of model metadata with different precedence rules.
That kind of drift is poison for trust. Agent platforms already ask users to accept a lot of indirection: provider aliases, local model catalogs, global config, agent-local models.json, BYOK providers, fallback chains, media tools, MCP servers, and channel-specific delivery constraints. If every path interprets capabilities slightly differently, the operator cannot predict behavior. The next failure may not be a PDF. It may be a tool route sending structured output to a model that cannot parse it, a vision fallback that silently downgrades to text, or a policy-constrained deployment accidentally re-enabling a disabled capability.
The fix preserves operator intent, which is the hard part
The PR’s design choice is the important part. It does not simply say “if the bundled catalog says image-capable, let the image tool use it.” That would be convenient and wrong. Some teams deliberately constrain a model to text-only for cost, compliance, reliability, or provider-behavior reasons. A global catalog default should not override an explicit local policy.
Instead, PR #92176 distinguishes omission from explicit choice. If a model config omits input, the resolver may fall back to bundled catalog metadata and recover the image capability. If an operator explicitly sets input: ['text'] in models.providers or agent-local models.json, the image/PDF path continues to reject that model. That is the correct contract for config systems: defaults should fill blanks, not erase intent.
The regression tests reflect that boundary. The posted verification shows src/media-understanding/image.test.ts passed 28/28 tests on Linux x64 / RHEL 8 / Node 24.13.1 / pnpm 11.2.2, including cases for explicit text-only inputs in both global provider config and agent-local model files. The tests are not just checking that the bug disappeared. They are checking that the fix did not turn catalog metadata into an override hammer.
What practitioners should do now
If you operate OpenClaw with multimodal tools, test capability resolution across all runtime paths, not only the happy path. Run normal chat with an image. Run the PDF tool. Run image extraction through an agent-local models.json. Test global models.providers. Test aliases. Test fallback chains. Then test the explicit-deny case by setting a known vision model to text-only and verifying the media tool refuses it. That last check is the one that tells you whether the platform respects policy or merely happens to work today.
If you build provider adapters or model catalogs, stop treating metadata as decorative. Capability flags need conformance tests. Every tool boundary that consumes a model capability should read from the same normalized contract, or at least prove why it does not. The UI, CLI, media tools, provider converters, and fallback planner should not each reinvent what “supports images” means.
There is also a documentation lesson. When users see text+image in models list, they reasonably assume every OpenClaw runtime path that requires image support will see the same fact. If that is not true, the docs should say which resolver owns which path. Better yet, the code should make that divergence impossible unless the operator explicitly asks for it.
This is a small PR with a large architecture smell attached. In 2026, model metadata is policy input, routing input, reliability input, and user expectation all at once. Catalogs are not brochures. They are part of the runtime. Treating omitted fields as text-only might be safe in a narrow parser. In an agent platform, it can make a capable model vanish.
Sources: OpenClaw PR #92176, OpenClaw issue #92104, OpenClaw v2026.6.5 release, OpenClaw v2026.5.28 release