OpenClaw’s llama.cpp Tool-Call Bug Is What OpenAI-Compatible Really Means in Production

OpenClaw’s llama.cpp Tool-Call Bug Is What OpenAI-Compatible Really Means in Production

“OpenAI-compatible” is one of those phrases that sounds precise until production gets involved. OpenClaw PR #89070 is a good example. The endpoint shape was compatible enough for a local llama.cpp-backed model to talk through OpenClaw’s OpenAI-style completions surface. The streaming semantics were different enough to corrupt nested tool-call arguments.

The visible symptom was not subtle. A local qwen-27b model routed through llama.cpp tried to call OpenClaw’s cron tools. Simple operations like cron list, cron remove, and cron runs reportedly worked. But cron.add and cron.update produced malformed parameter keys such as namePayload, scheduleKind, and sessionTargetName. The gateway rejected the call with strict validation errors like: invalid cron.add params: must have required property 'name'; at root: unexpected property 'namePayload'.

That is the kind of failure that local-agent advocates should take seriously. A model that can answer questions privately is not yet a usable local coding agent. It has to preserve structured tool calls through the runtime. If nested JSON mutates in transit, your private agent is not private infrastructure. It is a demo with sharp edges.

The bug is in the stream contract

The linked issue, #88439, was filed against OpenClaw v2026.5.27 on Fedora/Linux with local qwen-27b through llama.cpp. PR #89070 was created on June 1 and targets the streaming tool-call parsing path. The root cause is specific: llama.cpp can send complete accumulated JSON strings in every tool-call argument chunk rather than incremental deltas. OpenClaw was appending chunks with +=. That produced concatenated JSON sequences such as {"a":1}{"b":2}. The parser then kept the first complete object and silently dropped later fields.

In ordinary text streaming, appending chunks is the obvious thing to do. In tool-call streaming, the runtime must know whether each provider emits deltas, snapshots, or something in between. Standard OpenAI-style streaming teaches clients to expect incremental deltas. Some compatible servers preserve the endpoint and object shapes while diverging on chunk semantics. The difference only shows up when the tool payload is complex enough to expose it.

Cron is a useful canary precisely because it is structured. A scheduled job includes a name, payload, schedule, delivery configuration, enabled state, and session target. Flattened or concatenated keys are not cosmetic failures; they change the meaning of the action. If this happened against a tool that controlled deployments, database writes, or repository operations, “unexpected property” would be the best-case outcome. The worse case is a malformed-but-valid call.

The fix is conservative in the right way

PR #89070 changes both processOpenAICompletionsStream and processResponsesStream. The heuristic is deliberately narrow: if appending the new chunk to the existing buffer yields valid JSON, keep normal append behavior. If the appended buffer is invalid but the new chunk alone is valid JSON, replace the buffer. Otherwise, fall back to append. That preserves standard incremental providers while tolerating cumulative-chunk servers.

The regression test simulates chunks that reconstruct a full cron.add payload containing delivery, enabled, name, payload, schedule, and sessionTarget. After the fix, nested fields are preserved. The reported test result was clean: two files passed, 350 tests passed. ClawSweeper had already source-reproduced the malformed-argument path on #88439, though the review on #89070 itself did not complete cleanly at research time, so this should be treated as a strong fix candidate rather than a settled release guarantee unless it lands upstream.

The technical lesson extends beyond llama.cpp. “Compatible” APIs can diverge on streaming deltas, tool-call chunk IDs, function-call accumulation, finish events, reasoning metadata, error envelopes, and schema strictness. OpenClaw has already dealt with a related local-provider issue in #88617, preserving OpenAI-compatible reasoning replay from model metadata for Qwen-style local stacks. This is the tool-call side of the same story: adapter correctness is not a checkbox. It is a matrix.

How to test a local coding agent for real

If you are evaluating local or open coding agents, stop at “it answered a prompt” only if your goal is a screenshot. For practical use, test the tool path. Ask the model to create and update a scheduled job. Run nested MCP tool calls. Trigger multi-turn tool replay. Include partial chunks, cumulative chunks, and strict schema validation. Confirm the exact JSON that leaves the model is the exact JSON the gateway receives.

Also test failure visibility. If a provider emits cumulative chunks and the runtime assumes deltas, the user should not have to reverse-engineer malformed key names to find the problem. Logs should include redacted output item shapes, tool-call IDs, chunk mode decisions, parser recovery paths, and validation failures tied to the provider flavor. “Local model failed tool call” is useless. “llama.cpp cumulative argument chunk replaced prior invalid buffer and preserved schedule.name” is the kind of boring sentence that saves hours.

The privacy pitch for local coding agents is valid, but incomplete. Keeping prompts and code on your own machine is valuable. It does not remove the need for protocol discipline, adapter tests, and structured-call fidelity. A local agent that corrupts cron arguments can still make a mess locally. Privacy is not a substitute for correctness.

The take: local coding agents will not win on model weights alone. They need boring protocol compatibility. If an “OpenAI-compatible” server streams tool arguments differently, the runtime has to notice and adapt — because production compatibility is measured at the tool boundary, not at the HTTP route.

Sources: OpenClaw PR #89070, OpenClaw issue #88439, OpenClaw PR #88617, OpenClaw v2026.6.1-beta.1 release