Opus 4.8 Breaks the Old Thinking Schema, and OpenClaw's Allowlist Lag Shows the Cost of Hardcoded Model Routing
The phrase “supports Claude” is no longer a useful product claim. Which Claude? Which provider wrapper? Which thinking schema? Which tool behavior? Which region? OpenClaw PR #87835 is a small fix with a large lesson: frontier-model compatibility is now capability routing, not a boolean checkbox.
The immediate bug is narrow. OpenClaw’s supportsAdaptiveThinking() allowlist knew about Opus 4.7 but not Opus 4.8. When users configured Opus 4.8 with reasoning enabled, the Anthropic/Bedrock path could send the legacy extended-thinking schema, thinking.type: "enabled", instead of the adaptive schema required by newer Opus models: thinking.type: "adaptive" plus output_config.effort. The provider rejected the request with a 400. From the operator’s point of view, the configured primary model was available in the catalog and still failed at the transport layer.
The issue, #87801, includes direct AWS Bedrock Converse proof against us.anthropic.claude-opus-4-8: the legacy schema fails, the adaptive schema succeeds. Anthropic’s extended-thinking docs already state that manual extended thinking is not supported for Next Opus / Opus 4.7 and returns 400; the OpenClaw report shows Opus 4.8 following the same behavior. The code simply had not learned the new name.
Hardcoded model allowlists rot on release day
The root cause is exactly what you would expect: string matching. supportsAdaptiveThinking() matched variants such as opus-4-6, opus-4.6, opus-4-7, opus-4.7, sonnet-4-6, and sonnet-4.6. It omitted opus-4-8 and opus-4.8. PR #87835 adds isClaudeOpus48Model() in both Anthropic-native and Anthropic Vertex/Bedrock transport paths, includes 4.8 in adaptive-thinking detection, and maps xhigh effort to xhigh for Opus 4.8 like Opus 4.7.
The patch is small: 3 files, +40/-2. Tests are focused and sensible: pnpm test src/agents/anthropic-transport-stream.test.ts passed 46 tests, and pnpm test extensions/anthropic-vertex passed 27 tests across 6 files. The PR also includes source-level proof for 4.8 detection in both core and Vertex paths. What it does not include is a live OpenClaw call through Anthropic or Bedrock after the patch, though the issue provides direct provider proof for the failing and succeeding schemas.
That gap is understandable and still worth naming. Model transport bugs sit at the boundary between local routing logic and provider reality. Unit tests can prove the runtime constructs the intended payload. They cannot prove the provider will accept it next week, in every wrapper, under every feature combination. That is not a criticism of the PR; it is an argument for compatibility harnesses that exercise real provider payloads when a new frontier model is added.
The user-visible failure mode is worse when fallback hides it. If Opus 4.8 rejects the request and OpenClaw silently routes to another model, turns reasoning off, or recovers through a different lane, the chat may still produce a decent answer. That is good for liveness and bad for trust. Operators selected Opus 4.8 for a reason: quality, reasoning behavior, vendor policy, cost assumptions, or evaluation parity. If the runtime quietly fails away from that model, the session output is no longer evidence that the configured system worked.
Catalog support is not transport support
This bug also draws a useful distinction between model catalog availability and runtime compatibility. A catalog entry can make a model selectable. That does not mean the transport can handle its reasoning schema, tool-call format, streaming behavior, context limits, safety response shape, retry semantics, or provider wrapper quirks. The issue explicitly separates catalog availability from transport compatibility. That distinction should be standard in every coding-agent comparison and every enterprise evaluation.
For practitioners, the checklist is straightforward. When adding a new model to an agent runtime, test the feature combinations you actually plan to use: reasoning on/off, tools enabled, streaming enabled, Bedrock or Vertex wrappers if applicable, long-context requests, fallback paths, and error reporting. Do not stop at “the model appears in the dropdown.” A selectable model that fails once thinking is enabled is not supported in the way users mean supported.
For platform builders, the long-term answer is capability metadata, not a longer substring list. Model routers need structured, versioned knowledge of what each model supports: adaptive thinking versus manual budget tokens, effort levels, tool schemas, JSON modes, context windows, image inputs, vendor-specific transport parameters, and deprecation timelines. Ideally that metadata comes from provider APIs or a central catalog with compatibility tests. In practice, some of it will remain hand-maintained. But it should still be represented as capabilities, not scattered helper functions that need another opus-4-9 branch the morning after launch.
There is also an observability requirement. When a configured primary model fails because the runtime sent an unsupported schema, users should see that clearly: provider, model, feature flag, rejected field, fallback taken, and whether reasoning was disabled. Otherwise the system trains operators to trust answers while hiding the fact that the route they intended was never used. In agent systems, silent fallback is sometimes humane UX. It is also a governance risk unless it is auditable.
The editorial take is simple: “supports Claude” is now too imprecise to ship. Agent runtimes need capability-aware adapters because model families are moving faster than hardcoded allowlists. PR #87835 is the right immediate patch. The architectural fix is to stop treating new frontier model names as string additions and start treating them as compatibility contracts.
Sources: OpenClaw PR #87835, issue #87801, Anthropic extended thinking documentation, OpenClaw PR #70119