Claude Code Finally Gets the Prometheus Dashboard Its Enterprise Pilots Were Going to Build Anyway
Claude Code now has the kind of dashboard that only appears after a tool stops being a toy.
Rock Martel-Langlois published a Prometheus/Grafana dashboard for Claude Code metrics this morning, and on the surface it is infrastructure glue: a port of an existing Azure Application Insights dashboard into PromQL. That sounds small because dashboard ports usually are small. This one is more interesting because it marks a phase change. Claude Code is no longer just something an engineer runs in a terminal and judges by vibes. It is becoming a system that teams need to observe, budget, debug, and govern.
The new Grafana dashboard, listed as dashboard 25255, consumes Anthropic’s published OpenTelemetry metrics for Claude Code and works with Prometheus-compatible backends including Prometheus, VictoriaMetrics, Grafana Mimir, and Thanos. The pipeline is exactly what platform teams want to see: Claude Code emits OTLP, an OpenTelemetry Collector receives it, the Collector exposes /metrics, Prometheus scrapes it, and Grafana renders the dashboard. No exotic vendor path. No “copy this CSV into finance once a week.” Just the same metrics plumbing most infrastructure teams already operate.
That boringness is the point. The previous community dashboard, Grafana dashboard 25052 by 1w2w3y, targeted Azure Monitor/Application Insights using KQL. Useful if your company lives in that stack; less useful if your default operational language is PromQL. Martel-Langlois explicitly frames the new work as a port, not an invention: same panel intent, rebuilt against Anthropic’s metric names for the open observability stack. That is the kind of unglamorous translation work that makes a tool adoptable outside the vendor demo path.
The dashboard is really a governance surface
The dashboard’s five sections tell you what Claude Code has become. The Overview tracks sessions, users, total cost, total tokens, commits, pull requests, lines added and removed, active time, tokens by type, and tool decisions. Leaderboards show top users by spend and token volume, top sessions by cost, cost by model, edit decisions by language, and sessions by terminal. Cost & Tokens adds time series views. Activity & Productivity looks at active time, lines of code per hour, and accept/reject decisions. Cost Breakdown shows cost by query source, cost by effort, and cache hit ratio.
That is not a “developer productivity” dashboard in the hand-wavy executive sense. It is closer to a control panel for an expensive runtime that happens to write code. The moment a coding agent can spend real money, open pull requests, mutate files, call tools, and run for hours, operators need to know who is using it, which models are eating budget, whether cache behavior is sane, and which sessions are turning into runaway cost centers.
The metric list is concrete. The dashboard queries names such as claude_code_session_count_total, claude_code_token_usage_tokens_total, claude_code_cost_usage_USD_total, claude_code_active_time_seconds_total, claude_code_lines_of_code_count_total, claude_code_commit_count_total, claude_code_pull_request_count_total, and claude_code_code_edit_tool_decision_total. It filters by labels including organization, user, model, session, terminal type, token type, language, decision, query source, and effort. Put differently: this is enough data to answer operational questions without interrogating every developer in Slack.
That matters because AI coding-agent spend does not behave like old developer-tool spend. A compiler does not charge more because a senior engineer spends the afternoon refactoring a service. Claude Code does. A static analyzer does not quietly switch from a cheap mode to a high-effort reasoning path because the task got gnarly. Agentic tools can. If teams cannot see that behavior, they will manage it with blunt policy: lower limits, fewer seats, or a procurement fight disguised as an engineering standards discussion.
Prometheus support changes who can adopt this cleanly
The important adoption detail is not that Grafana can display a pretty dashboard. It is that Prometheus-family shops can put Claude Code next to the rest of their infrastructure telemetry. For many teams, Azure Application Insights is not where incident response, cost review, or service health lives. Prometheus, Mimir, Thanos, VictoriaMetrics, and Grafana are. If Claude Code telemetry has to live somewhere else, it becomes a side quest. If it lands in the existing observability estate, it becomes another operational signal.
Anthropic’s own monitoring documentation supports the larger model. Claude Code can export metrics as time-series data, logs/events through the logs protocol, and beta traces. Default metric export interval is 60 seconds; logs default to 5 seconds. Administrators can distribute telemetry settings through managed settings files, including the OTLP endpoint and headers. That means a serious rollout does not need every engineer hand-tuning shell variables. The platform team can centrally define where telemetry goes and how it authenticates.
There is one subtle boundary teams should notice: Anthropic says Claude Code does not pass OTEL_* variables to subprocesses it spawns, including Bash commands, hooks, MCP servers, and language servers. That is good separation. Claude Code telemetry and telemetry for commands Claude runs are not automatically the same thing. If your agent launches tests or internal tools that also emit OpenTelemetry, those processes need their own instrumentation and exporter configuration. Do not assume the agent’s OTLP endpoint is inherited by the code it runs.
The gotchas are where production lives
The dashboard documentation calls out several gotchas that are worth treating as launch blockers, not footnotes. First, pin metrics temporality to cumulative with OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=cumulative. Prometheus-family systems expect cumulative counters, and a temporality mismatch can produce quietly wrong rates instead of a clean failure. Observability bugs are worst when they look like data.
Second, enable resource_to_telemetry_conversion on the Collector’s Prometheus exporter if labels derived from OpenTelemetry resource attributes are missing. Without it, panels such as sessions by terminal may appear empty even though Claude Code is emitting data. Third, understand that the pull request counter only increments when Claude Code itself opens the PR. If developers use Claude Code locally and then open PRs manually, a zero in that panel does not mean the agent did no useful work.
Fourth, treat cost as an estimate. Claude Code computes cost from token counts and model prices on the client side. That should be good enough for dashboards, trend detection, budget alerts, and “which model burned the week?” analysis. It is not a finance ledger, especially when cached-token billing or price changes enter the picture. Reconcile official billing elsewhere.
The label story deserves extra discipline. The source article notes that teams can attach custom dimensions through OTEL_RESOURCE_ATTRIBUTES, such as team=platform, project=billing-svc, or cost_center=eng-123. That is exactly the right path for per-team, per-project, and per-repository views. It is also how teams accidentally create a cardinality problem. Add project and cost_center. Think carefully before adding branch names, ticket IDs, prompt hashes, or anything else that turns Prometheus into a very expensive junk drawer.
What engineering teams should do now
If your company is piloting Claude Code, importing this dashboard should be less controversial than deploying the agent itself. Start with one team. Configure telemetry centrally through managed settings or a blessed shell/bootstrap path. Route OTLP to an existing Collector. Add stable resource attributes for team, project, and cost center. Pin temporality. Verify the scrape target. Then build exactly one alert before anyone gets dashboard-happy: spend velocity above an expected threshold.
That last piece is more useful than ten vanity panels. If Claude Code becomes part of daily engineering work, finance will eventually ask why the bill moved. A spend-rate alert gives platform teams a way to catch runaway sessions or bad defaults before the answer becomes “we have no idea.” The same data can also support better policy: which model should be the default, when high-effort mode is justified, where cache hit ratio is poor, and whether certain workflows belong in automation at all.
There is also a security-adjacent benefit. OpenTelemetry does not sandbox MCP servers, validate repo instructions, or prevent a model from making a bad call. But it can show usage patterns: active time spikes, unusual model selection, unexpected terminal types, high-cost sessions, or edit-decision patterns that deserve inspection. Observability is not a security boundary. It is the smoke alarm. Teams still need sandboxing, permission policy, secret scanning, MCP review, and sane trust prompts. But a smoke alarm beats smelling burning plastic after the fact.
The broader story is that coding agents are entering the same lifecycle as every other production tool. First they impress individuals. Then teams adopt them informally. Then spend, reliability, security, and support questions arrive. The tools that survive that transition are not necessarily the flashiest models; they are the ones that can be operated without everyone pretending the terminal is a black box.
This Prometheus dashboard is not a major Anthropic launch. It will not get the attention of a model release or a pricing fight. But it is exactly the kind of artifact that tells you where the market is going. Claude Code has crossed the “needs dashboards” line. Once a developer tool crosses that line, it is no longer just a clever assistant. It is infrastructure. Treat it accordingly.
Sources: DEV Community, Grafana Labs dashboard 25255, GitHub source repository, Anthropic Claude Code monitoring docs