openclaw

OpenClaw’s Observability Patch Moves Agent Operations from Logs to Alertable Signals

Anatoliy Kolodkin

25 May 2026 • 4 min read

OpenClaw’s latest observability patch is not the kind of change that wins demos. That is exactly why it matters.

PR #86682 takes a set of gateway events that previously lived mostly in logs — model failover, blocked tool executions, oversized payloads, webhook ingress, webhook errors, stale sessions, and liveness warnings — and promotes them into OpenTelemetry and Prometheus signals. It also fixes shared OTLP endpoint URL resolution so /v1/traces, /v1/metrics, and /v1/logs are inserted before query strings and fragments instead of after them. That sounds like plumbing because it is plumbing. Agent platforms are now old enough to need plumbing.

The patch was created at 2026-05-26T00:01:35Z and merged roughly ten minutes later. It changed 17 files, with 666 additions and 64 deletions, across the gateway and the diagnostics extensions for OpenTelemetry and Prometheus. The labels tell the story: gateway, extensions: diagnostics-otel, extensions: diagnostics-prometheus, docs, maintainer, and size: L. This was not a drive-by metric. It was a runtime contract being made observable.

Logs are for forensics; metrics are for intervention

Agent operators have been too willing to accept logs as observability. That made sense when the system was a single local assistant with a terminal and a human watching. It stops making sense when the system can run cron jobs, call paid models, invoke tools, receive webhooks, block dangerous actions, and keep long-lived sessions warm across channels. By the time someone is grepping logs for a stale session or a model failover burst, the incident has already escaped the control plane.

The new signal list is pointed. Model failover tells you when the routing layer is no longer doing what the operator expected. Blocked tools tell you when safety policy is being exercised, either because a user is pushing boundaries or because a workflow is misconfigured. Oversized payload events identify the class of agent failures that look like model weirdness but are really transport or context-envelope problems. Webhook ingress and error counters make external integrations visible. Stale sessions and liveness warnings are the difference between “the process is up” and “the agent is actually making progress.”

That distinction is not academic. A gateway can be running, listening on a port, and still be operationally dead: event loop starved, session locked, webhook retries piling up, or provider failover silently draining budget. Traditional health checks often miss that because they answer the easiest question: did the process respond? Agent runtimes need the harder questions: did the intended model run, did the tool policy fire, did the session advance, did the callback land, did the retry loop stop?

Observability has a trust boundary too

The most interesting part of #86682 is not the counter list. It is diagnostic provenance. The patch carries provenance so Prometheus records core gateway stability events while dropping plugin-spoofed diagnostics. That is the right instinct for a plugin-heavy platform.

OpenClaw’s architecture makes plugins powerful by design. They can expose tools, integrate channels, touch external APIs, and participate in runtime behavior. But that also means metrics are not neutral. If a plugin can emit something that looks like a core gateway liveness warning, it can pollute dashboards, hide real incidents, or create false ones. Observability data is not just output; it becomes input to operators, alerts, autoscaling, incident review, and trust decisions. Bad telemetry can be its own failure mode.

This is a lesson the broader agent ecosystem should steal. As agent platforms externalize more capability into plugins, MCP servers, channel adapters, and provider shims, the monitoring layer needs source identity. A blocked tool event from the gateway is not the same as a plugin claiming a blocked tool event happened. A stale-session warning emitted by core scheduling code is not the same as an extension trying to be helpful. If the dashboard cannot tell those apart, the dashboard is part of the attack surface.

What operators should actually do

If you run OpenClaw in anything beyond hobby mode, the action item is not “nice, metrics exist.” Wire them into an actual monitoring path. Alert on model-failover rate, not just provider outage. Alert on blocked-tool spikes, especially after skill or plugin changes. Watch webhook error bursts separately from general gateway errors. Track stale-session counts as a reliability SLO. Put liveness warnings on the same board as CPU, memory, and event-loop health.

Then correlate these signals with cost. Coding agents increasingly fail in expensive ways: retrying a stuck tool, falling back to a pricier model, reprocessing oversized context, or repeatedly reviving a stale session. A model-failover counter without cost context tells you something broke. A model-failover counter next to spend tells you how much the broken thing is costing while you sleep.

The verification on this patch is also a useful benchmark. Focused tests covered 26 infra diagnostic-event tests, 65 diagnostics-otel tests, and 17 diagnostics-prometheus tests. pnpm check:changed passed the core, core tests, extensions, extension tests, and docs lanes. In-process smoke observed OTel spans, metrics, logs, traces, metric requests, and log requests, and collector-backed smoke used otel/opentelemetry-collector:0.104.0. That is what good observability changes should look like: not “we added a metric,” but “we proved the metric path works with the collector shape users actually deploy.”

The editorial take is simple: agent governance becomes real when failures become measurable. Permission prompts are useful, but they are not enough. The mature control plane is the one that can say which model failed over, which tool was blocked, which webhook started erroring, which session went stale, and whether that signal came from trusted core code or a plugin with opinions. OpenClaw is moving in that direction. Good. The agent category needs fewer vibes and more counters.

Sources: OpenClaw PR #86682, OpenTelemetry Metrics, Prometheus instrumentation guidance, OpenClaw OpenTelemetry docs, OpenClaw Prometheus docs

Logs are for forensics; metrics are for intervention

Observability has a trust boundary too

What operators should actually do

Sign up for more like this.