OpenClaw’s 24 Tool-Result Cap Is a Reminder That Agent Context Is a Protocol Boundary, Not a Junk Drawer
Context is not a junk drawer. It is a protocol boundary with rules, limits, and sharp edges.
OpenClaw PR #92613 makes that boundary explicit by capping provider-boundary replay to the latest 24 historical toolResult messages and dropping the matching old assistant tool-call blocks when older results are omitted. The fix addresses issue #92315, where long-running agent sessions with roughly 25 or more completed tool results could produce invalid provider request schemas, failed retries, session-file accumulation, and gateway lock contention bad enough to block /new.
The story is not “OpenClaw trimmed some history.” The story is that agent platforms are finally running into the difference between internal state and provider protocol. A transcript can be valid as durable session history and invalid as an LLM request. Treating those as the same object is how long agent runs become haunted.
The latest 24 results are a policy, not magic
The PR is open, mergeable, and labeled around agents, P1 severity, and session-state merge risk. It changes two files with 280 additions and one deletion. The key behavior is narrow: persisted session input remains untouched, but when OpenClaw serializes history for the provider boundary, it keeps the newest 24 toolResult messages. To avoid dangling pairs, it also drops the corresponding old assistant tool-call blocks. That pairing detail is the actual fix. Providers do not merely count messages; they validate relationships between tool calls and tool results.
The regression proof used 30 completed assistant tool-call / toolResult pairs. The output kept 24 tool results, with firstOutputToolResult: call_6 and lastOutputToolResult: call_29. It also preserved 24 assistant tool calls and reported noDanglingAssistantToolCalls: true. That is the number that matters. A transcript with fewer messages but broken pairing would simply fail in a smaller font.
Issue #92315 connects the schema bug to operational fallout: malformed API requests, unclear retries, accumulating session files, and manual recovery. The report mentions more than 6,000 session files and queue lock contention blocking new sessions. That is the classic shape of an agent-runtime failure. The original problem is a serialization edge case. The symptom is platform instability. The operator sees “why can’t I start a new session?” while the cause is buried 25 tool calls deep in a different run.
Bigger context windows do not repeal lifecycle management
It is tempting to file this under “models need larger context.” That misses the point. Larger context windows help with token pressure, but tool history has structure beyond tokens. Tool calls must match results. Provider schemas differ. Retries need a request shape the API will accept. Old outputs may be useful to the human record while being poisonous to the next provider call if replayed naïvely.
The cap is pragmatic, but operators should treat it as a runtime policy. Old tool output can disappear from the model’s immediate view even though it remains in the persisted session. That is usually reasonable because recent tool results tend to be most relevant. But long-running tasks are not always recent-first. An early discovery — a config invariant, a test failure root cause, a customer constraint, a dependency version — may remain important 40 tool calls later. If that fact only exists as an old tool result, truncation becomes accidental amnesia.
The fix, then, is not “never truncate.” The fix is to promote durable facts out of disposable tool chatter. Agents need summarization, memory, knowledge layers, and explicit state artifacts. A test runner output can be summarized into “failing integration test X due to missing env var Y.” A repository inspection can become a short architecture note. A long search can produce a cited finding. The raw tool result may not deserve eternal replay; the distilled fact might.
This is where the recent Stack Overflow for Agents framing is relevant. External validated knowledge layers can reduce some brute-force transcript stuffing by letting agents query known answers instead of rediscovering them. But they do not remove the need for local runtime hygiene. Agents still inspect repos, run tests, call CLIs, and handle tool outputs. The question is what becomes immediate context, what becomes memory, what becomes an external lookup, and what gets discarded.
What engineers should change tomorrow
If you operate long-running agents, monitor tool-call count as a first-class signal alongside token usage, cost, and latency. A run that is 40 tool calls deep is in a different risk class than a three-turn chat. The platform should expose when it is truncating tool results, how it preserves pairing, and whether provider requests are failing schema validation before retry loops amplify the problem.
Tool authors should also tighten outputs. Returning giant blobs because “the model might need it” is lazy API design. Prefer concise, structured, durable results: paths changed, tests failed, commands run, important excerpts, and next-step hints. If the agent needs full logs, store them as artifacts and reference them. The model does not need the entire haystack in every retry.
Teams comparing coding agents should ask uncomfortable but practical questions: how does the platform handle old tool results? Does it summarize? Does it preserve tool-call/result pairing? Does it fail loudly on invalid provider requests? Can it recover without growing thousands of session files? Does the UI show what context was omitted? These answers matter more in production than another polished demo of an agent making a TODO app.
The editorial take: OpenClaw’s 24-result cap is small, sensible, and incomplete in the way all real runtime fixes are. It closes a failure mode and exposes the next design requirement. Agent context has a lifecycle. Platforms that treat it as structured state will survive long tasks. Platforms that treat it as an infinite chat log will keep discovering protocol boundaries by crashing into them.
Sources: OpenClaw PR #92613, OpenClaw issue #92315, Stack Overflow for Agents, SD Times roundup on Stack Overflow for Agents