OpenClaw’s Reasoning Replay Fix Is a Win for Local and Self-Hosted Coding Models

OpenClaw’s Reasoning Replay Fix Is a Win for Local and Self-Hosted Coding Models

“OpenAI-compatible” has become the USB-C port of local inference: everyone uses the shape, nobody quite agrees on the semantics, and the adapter drawer is somehow still full. OpenClaw PR #88617 is a small fix in that larger mess. It preserves replayed reasoning_content for OpenAI-compatible models when the selected model metadata declares reasoning: true, fixing a regression that mattered especially for local and self-hosted coding models.

The affected issue, #88068, came from a user running Qwen3.6-35B-A3B through llama.cpp’s OpenAI-compatible API on OpenClaw 2026.5.28. The reporter found that OpenClaw was stripping replayed reasoning from openai-completions history unless the model id appeared in a small internal allowlist, such as Kimi or MiniMax/MiMo paths. For a reasoning model whose multi-turn quality depends on prior reasoning blocks, that is not harmless transcript hygiene. It is silent capability loss.

The first tempting fix was a new provider-level config key: let users set dropReasoningFromHistory: false. That workaround did not work in the reported setup because the provider schema used additionalProperties: false and rejected the key. A proposed PR, #88071, moved toward adding such a public configuration surface. Maintainer review pushed back, correctly, because every new runtime knob becomes compatibility, documentation, migration, and support debt. PR #88617 lands a narrower fix: use existing model metadata instead of inventing another switch.

Reasoning replay is now part of model compatibility

The deeper story is that “history” is not just chat text anymore. Reasoning models introduce hidden or semi-structured artifacts: reasoning_content, thinking blocks, signatures, previous response ids, model-specific tool-call traces, and provider-specific replay rules. Some systems reject those fields on replay. Some need them. Some tolerate them until a version changes. Agent runtimes cannot treat all OpenAI-shaped endpoints as if they share one transcript contract.

OpenClaw’s previous behavior tried to solve that by whitelist. If the model id was known to benefit from replayed reasoning, keep it. Otherwise, drop it. That works for the catalog the maintainer knows about. It fails the moment a user runs a Qwen-style model behind llama.cpp, vLLM, an OpenAI-compatible proxy, or a private inference gateway with a model id the runtime has never seen.

The metadata-based approach is better because capability belongs with the model, not the provider endpoint. An OpenAI-compatible endpoint tells you protocol shape. It does not tell you whether this model expects reasoning replay. If the selected model declares reasoning: true, preserving replayed reasoning is a defensible runtime choice. The fix also preserves outbound reasoning_content serialization, which matters because compatibility bugs often appear in one direction first and then reappear in the other.

The verification is appropriately scoped: src/agents/transcript-policy.test.ts, src/agents/embedded-agent-runner.sanitize-session-history.test.ts, and src/agents/openai-transport-stream.test.ts passed, covering 3 files and 330 tests. The PR explicitly did not test live llama.cpp or vLLM server acceptance. That caveat is honest and important. Deterministic serialization tests prove the runtime policy changed; they do not prove every OpenAI-compatible server will accept every replay shape.

Local coding agents need metadata discipline, not vibes

For builders running local coding agents, the action item is to audit model metadata with the same seriousness you audit endpoint URLs and API keys. If your model relies on reasoning replay, make sure the platform knows it is a reasoning model. If you are routing through llama.cpp, vLLM, Ollama-adjacent adapters, or private OpenAI-compatible proxies, test multi-turn agent work, not only single-prompt benchmarks. Many compatibility failures hide until replay, compaction, sanitization, or long-session recovery.

This matters for the “open-source Copilot alternative” crowd because local coding agents are often sold on privacy, cost control, and customization. Those are real advantages. But local inference does not magically remove runtime semantics. A self-hosted model still needs accurate metadata for reasoning, tool support, context limits, streaming shape, stop conditions, and transcript replay. If the runtime guesses wrong, the model may look worse than it is.

There is also a governance angle. A config knob would have let power users patch over the issue locally, but it would also create another place for teams to drift. One workspace preserves reasoning, another strips it, a third inherits a stale provider config, and now “Qwen through llama.cpp” means different things in every environment. Metadata centralization is less flexible in the moment, but easier to reason about across teams.

The broader lesson is that OpenAI compatibility should be treated as a transport contract, not a model contract. It tells the runtime how to send a request. It does not fully tell the runtime what state is safe, useful, or required to replay. Agent platforms need a richer capability layer above provider shape: reasoning replay, tool schema strictness, multimodal support, context behavior, previous-response semantics, and provider-specific rejection handling.

That may sound like catalog housekeeping. It is not. For coding agents, transcript continuity is the product. If a model’s prior reasoning gets stripped across turns, the agent may still answer, but it is no longer operating with the state the user thinks it has. That is the worst kind of compatibility bug: quiet, plausible, and benchmark-resistant.

LGTM take: local coding agents are not won by raw model weights alone. The runtime has to preserve the model’s conversational semantics, or “OpenAI-compatible” becomes compatibility theater.

Sources: OpenClaw PR #88617, issue #88068, PR #88071, OpenClaw transcript hygiene docs