openclaw

OpenClaw’s Trace-Chaining PR Points at the Next Agent Observability Layer

Anatoliy Kolodkin

11 Jun 2026 • 4 min read

Agent observability has outgrown the comforting fiction that logs are enough. A modern OpenClaw run can enter through a gateway request, route through a channel identity, assemble prompt context, invoke an embedded harness, stream model events, call tools, hit MCP servers, compact state, retry provider failures, hand off to background work, and finally deliver a reply somewhere else. If those pieces do not share a causal trace, operators do not have observability. They have a box of timestamped fragments and a hope that grep will be kind.

That is why OpenClaw PR #92161 is worth covering even though it was not merge-ready at research time. The patch tries to connect gateway diagnostic traces into embedded agent runs so request → invocation → harness → model/tool spans remain linked instead of fragmenting across runtime layers. It adds trusted-boundary-aware gateway trace helpers, run.invocation and run.invocation.completed diagnostic events, diagnostics-otel span recording, and tests for trusted versus untrusted traceparent handling.

The PR also has a real blocker: review found that invocation completion can be emitted before a later post-compaction abort guard throws. That means a run could be recorded as completed and then fail. In observability code, that is not a cosmetic ordering nit. It is the difference between a dashboard that tells the truth and a dashboard that confidently lies.

The trace has to cross the same boundaries the agent crosses

Traditional web-service tracing is comparatively tidy. An HTTP request enters a service, calls a database, maybe calls another service, then returns. Agent runtimes are messier because the unit of work is not a request in the old sense. A run may include human messages, system prompts, memory recall, tool approvals, provider streaming, local shell execution, browser automation, MCP resources, retries, cancellation, compaction, and channel delivery. Some of those events happen synchronously. Some happen after the original request returned. Some happen under a different identity or execution harness.

PR #92161 is aimed at that seam. The diff touches extensions/diagnostics-otel/src/service.ts, src/gateway/request-diagnostic-trace.ts, HTTP and WebSocket gateway handlers, diagnostic event definitions, and focused tests. The important design choice is that it chains the existing DiagnosticTraceContext instead of inventing a second run/request scope. That matters because agent platforms already suffer from too many parallel notions of “session,” “run,” “task,” “request,” and “thread.” Observability should reduce ambiguity, not add another ID hierarchy for operators to reconcile.

The PR’s direction also aligns with the broader OpenClaw release train. Recent work around trusted diagnostics capture, first-assistant-event traces, slow initial reply warnings, cached model metadata, lazy slash-command loading, and channel-state hardening all point to the same thing: the platform is trying to make agent behavior explainable after the fact. That is the correct priority. Once agents run longer than a single chat turn, postmortem quality becomes a product feature.

Do not let public trace IDs become trusted facts

The most security-relevant part of the PR is its treatment of traceparent. Trace context is useful, but externally supplied trace IDs can become spoofing inputs if accepted blindly. Public webhooks, local direct clients, Slack, Telegram, browser bridges, ACP harnesses, and internal gateway calls do not deserve the same trust level. If an unauthenticated public request can inject a trace ID that the diagnostics system treats as trusted lineage, an attacker can pollute telemetry, confuse incident response, or make unrelated events appear causally connected.

PR #92161 intentionally ignores inbound traceparent by default and only honors it for local-direct trusted gateway boundaries. That is the right conservative posture. Observability systems are often treated as passive recorders, but they become active security surfaces the moment operators rely on them for attribution, audit, or policy decisions. A forged trace is not just “bad metadata.” It can change what a human believes happened.

This is especially true for agent systems because identity is already complicated. The actor may be a Slack user, a bot, a channel plugin, a cron job, a subagent, an ACP harness, or a local process. The trace layer should preserve those boundaries. It should not flatten them into one convenient but misleading chain because a header happened to arrive.

Completion events must mean terminal completion

The ClawSweeper review caught the right bug. The PR emits run.invocation.completed before a later post-compaction abort guard can throw due to postCompactionAbortError. That breaks the semantic contract of the event. “Completed” cannot mean “we reached a point in the function before a later terminal failure.” It has to mean the invocation reached a terminal success state.

Bad telemetry is worse than missing telemetry because it creates false confidence. If a run fails after a “completed” span, dashboards undercount failures, SLOs look healthier than reality, and incident responders chase the wrong edge. The right fix is not merely to rename the event. The event lifecycle has to match the runtime lifecycle exactly: start when invocation begins, attach child spans through harness/model/tool execution, and close only after all late abort, compaction, cancellation, and delivery semantics are resolved.

The review also demanded redacted real runtime proof from an actual gateway request invoking an agent run and showing the resulting trace chain. Good. Unit tests can prove helpers behave. They cannot prove the full runtime path produces a useful trace under real gateway conditions. Agent observability needs both: narrow tests for trust-boundary logic and real traces showing operators can reconstruct the run without reading source code.

What teams should ask of agent observability

If you are evaluating OpenClaw or any agent platform, ask for one connected story per run. Can you see the inbound identity, authorization decision, prompt assembly, tool approvals, model/provider calls, MCP interactions, compaction events, retries, cancellation, background handoff, and final delivery? Can you distinguish trusted internal trace context from untrusted external headers? Can you tell whether a run completed, failed, aborted, or was superseded after compaction? Can you export enough redacted evidence for a postmortem without leaking secrets?

Those questions are more practical than “does it support OpenTelemetry?” Support is a checkbox. Causal traceability is an operating property. The difference shows up at 2 a.m. when an agent claims it finished a task, the user says it did not, and the logs disagree with the dashboard.

PR #92161 belongs in “needs review,” not “approved,” because its lifecycle semantics still need correction. But the architecture is pointing in the right direction. The next serious agent-platform feature is not another model adapter. It is a trace that tells one honest story from gateway ingress to terminal run state, across trusted boundaries, without pretending every span is equally trustworthy.

Sources: OpenClaw PR #92161, OpenClaw v2026.6.6-beta.1 release, OpenClaw v2026.6.5 release, OpenTelemetry trace concepts

The trace has to cross the same boundaries the agent crosses

Do not let public trace IDs become trusted facts

Completion events must mean terminal completion

What teams should ask of agent observability

Sign up for more like this.