claude-code

Claude Code 2.1.161 Turns Observability and MCP Redaction Into Shipping Work

Anatoliy Kolodkin

02 Jun 2026 • 5 min read

Claude Code 2.1.161 is the kind of release note that looks ignorable until you have to explain an agent bill, a leaked MCP credential, or a corrupted JSON stream in a production automation pipeline.

That is the theme. Not a model jump. Not a new demo. Not another screenshot of a terminal doing a refactor with suspiciously clean tests. Anthropic’s late-day patch turns observability, MCP redaction, and background-agent correctness into first-class shipping work. For teams trying to move Claude Code from “personal terminal assistant” to “engineering runtime we can govern,” that is the work that matters.

The release landed at 2026-06-02T21:58:22Z, roughly twenty hours after 2.1.160, which tightened acceptEdits around execution-adjacent config files. Read together, the two releases tell a coherent story: Anthropic is sanding down the places where coding agents stop being cute and start being operationally risky. Yesterday’s magic trick was “the agent edits code.” Today’s question is “can we measure it, can we keep secrets out of logs, and can automation consume its output without getting garbage spliced into stdout?”

Telemetry labels are not dashboard cosmetics

The headline operational change is small but important: OTEL_RESOURCE_ATTRIBUTES values are now included as labels on metric datapoints. In plain English, Claude Code usage metrics can carry custom dimensions — team, repo, environment, workflow, cost center, or whatever taxonomy your observability stack already uses.

That sounds like plumbing because it is plumbing. Good plumbing is why serious systems do not flood the basement.

Aggregate token totals are almost useless once agents become shared infrastructure. Engineering leaders need sharper questions answered: which repository is driving spend, which team is triggering expensive retries, which background workflow fans out too aggressively, whether a provider route is being used as intended, and where latency or tool-call failures cluster. Without labels, those questions turn into spreadsheet archaeology. With labels, Claude Code activity can be sliced using the same operational vocabulary teams already apply to services, jobs, and queues.

This is where the coding-agent market is quietly moving. The winner will not simply be the agent that writes the cleverest patch in a benchmark. The winner will be the one whose work can be measured, attributed, audited, budgeted, and killed cleanly when it misbehaves. Model capability gets the launch-day applause. Observability gets the enterprise renewal.

2.1.161 also fixes a race where OpenTelemetry log events named user_prompt, api_request, tool_result, and tool_decision could be silently dropped if emitted before telemetry initialization completed. That bug class is brutal because it creates false confidence: teams believe they are logging the important edges, while the earliest and often most diagnostic events vanish. Agent runtimes need boring startup semantics. If the first prompt, first API request, or first tool decision can disappear from telemetry, your audit trail has a hole exactly where incidents like to begin.

MCP redaction is a boundary, not a courtesy

The most directly security-relevant fix is in Claude Code’s MCP commands. claude mcp list, get, and add no longer print secrets to the terminal: ${VAR} references are no longer expanded, and credential headers plus URL secrets are redacted.

That is not merely nicer terminal output. MCP is becoming the connector layer for agent tools, and connector layers are secret-handling systems whether their marketing pages admit it or not. If an MCP command expands environment variables or prints bearer headers into a terminal, those secrets can land in shell scrollback, copied debugging transcripts, CI logs, screen shares, terminal recorders, issue comments, support bundles, or whatever observability wrapper is collecting stdout. “It was only local” is not a security model. It is a confession that nobody traced the data path.

The fix supports a more mature MCP checklist: keep credentials in environment references or secret stores, redact at display boundaries, avoid storing plaintext config where possible, log connector actions without logging connector secrets, and review MCP server configuration as seriously as you review CI credentials. MCP adoption has been moving faster than MCP governance. This patch closes one leak path, but the larger lesson is that every agent tool surface needs a “what can this print?” review, not only a “what can this call?” review.

There is also a practitioner nuance here: redaction must be designed into both human-facing and machine-facing paths. A terminal command that is safe for a human to inspect may still be unsafe if its output is piped into an agent transcript, stored in telemetry, or attached to an error report. Agent runtimes collapse UI, logs, prompts, and tools into one loop. That makes old-school “do not paste secrets into chat” advice insufficient. The system itself has to avoid producing secret-shaped text in the first place.

Stdout cleanliness is production readiness

Several fixes target background-agent correctness. Background subagent output no longer corrupts claude -p stdout when using --output-format text or json. Parallel tool calls now fail independently, so a failed Bash command no longer cancels other calls in the same batch. claude agents now shows done/total for fanned-out work and peeks the longest-running item. Background sessions dispatched from claude agents now boot on the model in settings.json instead of a stale model from the daemon environment. Completed subagents should no longer get stuck showing as running after finalization errors.

None of that will trend on Hacker News. All of it matters if you are wiring Claude Code into scripts, CI-style jobs, evaluation harnesses, or scheduled agent workflows.

JSON stdout is a contract. If background chatter corrupts it, downstream parsers fail or, worse, parse the wrong thing. Model selection is a contract. If a background daemon silently uses a stale environment model instead of the configured one, reproducibility and cost controls are fantasy. Fan-out status is a contract. If operators cannot see how much work is done or which item is longest-running, they cannot make sane decisions about cancellation, retries, or escalation.

The independent parallel-tool failure behavior is also a meaningful runtime design choice. In an agent loop, batched tools often mix independent reads, searches, inspections, and shell operations. Letting one failed Bash command cancel unrelated calls turns recoverable partial progress into total failure. Returning each result independently gives the agent and the operator a more accurate view of reality: this command failed, those reads succeeded, proceed accordingly. Distributed systems learned this lesson years ago. Agent harnesses are relearning it in public.

Policy compatibility is part of the product

The release also fixes a regression introduced in 2.1.146 where managed login policies such as forceLoginOrgUUID and forceLoginMethod could block third-party provider sessions on Bedrock, Vertex, Foundry, and Mantle alongside the organization pin. That is an enterprise-sounding detail with a real adoption consequence.

Companies do not deploy coding agents into a vacuum. They have identity policy, provider approvals, model routing rules, procurement constraints, and regional infrastructure requirements. If an org pin or login-method policy breaks sessions routed through approved third-party provider surfaces, the practical result is not “more secure.” It is broken governance. People work around broken governance. The right behavior is compatibility between identity controls and approved provider paths, not a policy system that accidentally pushes users back toward unmanaged setups.

For teams running Claude Code seriously, the action items are concrete. Upgrade to 2.1.161. Add useful OTEL_RESOURCE_ATTRIBUTES now, before your usage story becomes a mystery novel. Tag by team and repository at minimum; add workflow or environment if you run background agents. Verify that MCP command output no longer exposes secrets in your terminal, logs, and wrappers. Test claude -p --output-format json in the exact automation path you use, not in a happy-path demo. Check background-agent model selection after daemon restarts. And if your organization relies on Bedrock, Vertex, Foundry, or Mantle, validate managed-login behavior instead of assuming the policy layer is boring.

The broader read is simple: Claude Code is becoming less interesting as a chatty CLI and more interesting as an observable, policy-bound agent runtime. That is good. The terminal assistant era made the product beloved. The runtime era will decide whether teams can trust it with real engineering work.

2.1.161 is not glamorous. It is labels, redaction, stdout hygiene, policy fixes, and background queue correctness. In other words: the stuff that tells you whether a vendor understands production adoption after the demo ends.

Sources: Anthropic Claude Code v2.1.161 release, Claude Code monitoring usage docs, Claude Code MCP docs, Claude Code v2.1.160 release

Telemetry labels are not dashboard cosmetics

MCP redaction is a boundary, not a courtesy

Stdout cleanliness is production readiness

Policy compatibility is part of the product

Sign up for more like this.