claude-code

Claude Code 2.1.152 Moves Agent Governance Into the Skill and Hook Layer

Anatoliy Kolodkin

27 May 2026 • 6 min read

Claude Code 2.1.152 looks like a patch release until you read it as an operations document. The headline feature is easy to demo: /code-review --fix can now apply review findings directly to the working tree. The more important shift is quieter. Anthropic is moving more of Claude Code’s control plane into skills, slash commands, hooks, plugin marketplaces, telemetry, and usage accounting — exactly the surfaces teams have to govern once coding agents stop being toys and start touching real repositories every day.

That distinction matters because most organizations do not fail at agent adoption because the model cannot write a loop. They fail because nobody can answer boring questions with confidence: what tools was the agent allowed to use, which plugin suggested that command, why did this session cost twice as much as the last one, and did the review loop produce a real improvement or just self-approval theater? Version 2.1.152 is not glamorous. It is useful in the way a good incident runbook is useful: it gives operators more places to put policy before the blast radius gets interesting.

Least privilege is starting to move into the workflow itself

The most consequential line in the release is that skills and slash commands can now set disallowed-tools in frontmatter, removing tools from the model while that skill or command is active. That is a small primitive with large implications. A documentation skill should not retain shell and write powers if all it needs is project context. A triage command may need issue access and log reads, but not deployment credentials. A migration workflow may need tests and file writes, but not a broad web-fetching tool pointed at arbitrary content.

This is the right direction because agent permissions should not be a single global switch. Human engineers already work with contextual authority: read-only dashboards, scoped cloud roles, branch protections, database replicas, production break-glass, and CI jobs with specific service accounts. Coding agents need the same shape. The agent’s powers should change depending on whether it is writing docs, editing CI, patching auth code, running a security scanner, or summarizing a design doc.

disallowed-tools is not the whole governance story. Teams still need signed skills, provenance, permission previews, approval policy, sandboxing, audit logs, and a way to enforce organization rules across local developer machines. But it is a useful composable lever. If skills are going to become reusable workflow packages, they need to carry not just instructions, but constraints. A prompt package that says “do this task” without also saying “do not use these tools” is only half a policy object.

The related hook changes belong in the same bucket. Claude Code now has a /reload-skills command, and SessionStart hooks can return reloadSkills: true so newly installed or generated skills become available without restarting the session. SessionStart hooks can also set the session title via hookSpecificOutput.sessionTitle, and a new MessageDisplay hook can transform or hide assistant message text as it is displayed.

That last feature deserves both appreciation and suspicion. Display hooks can be excellent for labeling, redaction, routing, or local workflow hygiene. They are also capable of changing what the human sees. If a hook can reshape assistant output, it belongs in the same review bucket as shell hooks, editor extensions, pre-commit scripts, and CI glue. Useful automation that sits between the model and the developer is still automation that can mislead the developer if it is buggy or hostile.

`/code-review --fix` is a pre-review pass, not a rubber stamp

The feature developers will notice first is /code-review --fix. Claude Code can now apply review findings to the working tree after review, including reuse, simplification, and efficiency suggestions. Anthropic also changed /simplify so it invokes /code-review --fix, making code review the canonical cleanup path rather than a separate command family.

That is a sensible product move. A lot of useful code review is not deep architecture judgment; it is “this helper already exists,” “this branch can be simplified,” “this allocation is unnecessary,” “this error path is inconsistent,” or “you repeated the same idea in three places.” Agents are good at applying those patches when the scope is clear. Letting the review pass produce a diff is faster than making the human manually translate every comment into edits.

The trap is letting the same agent ecosystem become author, reviewer, fixer, and approver. /code-review --fix should be treated like a pre-review cleanup pass. Run it, inspect the diff, run tests, and then send the result through normal review. Do not confuse an agent applying its own suggestions with independent validation. The most useful pattern is separation: one workflow writes the patch, another narrower review process checks it, deterministic tools run tests and scans, and the human reviewer evaluates the result with the full trace available.

Anthropic’s security guidance plugin docs point in that direction. The plugin reviews Claude’s code at three depths: deterministic per-edit pattern checks, model-backed end-of-turn diff review, and deeper agentic review on Claude-made commits or pushes. Anthropic says internal rollout and benchmarks saw a 30–40% decrease in security-related PR comments for PRs opened using the plugin. That is a meaningful signal, but the docs also state the plugin does not block writes or commits, caps commit and push reviews at 20 per rolling hour, covers up to 30 changed files per turn, loads custom guidance up to 8 KB, and supports up to 50 custom pattern rules.

The lesson is not “Claude can grade Claude, ship it.” The lesson is that layered review works when each layer has a narrow job. Teams should add project-specific security rules for their actual risks: tenant isolation, auth checks, logging restrictions, workflow-file permissions, cryptographic comparison rules, framework-specific injection bugs, and anything their codebase has historically gotten wrong. Generic security guidance is a starting point. Your threat model is the product.

The cost and telemetry work is boring in exactly the right way

Version 2.1.152 also continues Claude Code’s recent push toward usage attribution. /usage now includes large session files, scanned with streaming reads so memory usage stays flat. That follows v2.1.149’s per-category breakdown for skills, subagents, plugins, and per-MCP-server cost. Cache creation input tokens now report correctly when the API returns nested cache-creation breakdowns. Operators can also opt into emitting the session entrypoint as an OpenTelemetry metric attribute with OTEL_METRICS_INCLUDE_ENTRYPOINT=true.

This is the unsexy infrastructure that makes agent adoption survivable. If a team cannot attribute cost to skills, subagents, plugins, MCP servers, large session files, cache behavior, and entrypoints, “AI budget” becomes a fog machine attached to a corporate card. Usage visibility is not just finance hygiene. It is product feedback. Expensive workflows are often sloppy workflows: repeated reads, huge tool outputs, broad MCP queries, unnecessary subagent fan-out, or prompts that force the model to rediscover project context every run.

The release also fixes several operational papercuts that matter more in production than in demos: plugin MCP servers with the same command but different environment variables are no longer incorrectly deduplicated; remote MCP servers reconnect in Claude Code Remote sessions when egress proxy is enabled; stale plugin registry updates are restored; and Claude Code now switches to the configured --fallback-model for the rest of a session when the primary model is not found instead of failing every request.

None of those make a good launch tweet. All of them reduce the ways an agent runtime can surprise an operator. The same is true for the new managed setting pluginSuggestionMarketplaces, which lets admins allowlist organization marketplaces whose plugins may be suggested via context-aware tips. Plugin discovery is a supply-chain surface. If a tool can suggest plugins based on context, administrators need a way to define which marketplaces are in bounds.

The rollout advice is straightforward. Upgrade a test machine first. Review team skills and slash commands for tool authority, then add disallowed-tools wherever a workflow does not need shell, write, network, or high-risk MCP access. Inventory hooks, especially anything running at SessionStart or affecting displayed output. Treat /code-review --fix as a cleanup assistant, not a substitute for review. Turn on usage and telemetry where policy allows, and watch for the workflows that burn cost without producing better diffs.

Claude Code 2.1.152 is not a model-quality story. It is a control-plane story. The frontier is no longer just “can the agent write code?” It is “can the team constrain, observe, price, review, and explain what the agent did?” This release moves several of those answers from vibes into configuration. LGTM — with the usual condition that somebody actually reads the diff.

Sources: Claude Code GitHub release v2.1.152, Claude Code changelog, Claude Code security guidance docs, Claude Code hooks docs, Claude Code skills docs, Claude Code monitoring usage docs

Least privilege is starting to move into the workflow itself

/code-review --fix is a pre-review pass, not a rubber stamp

The cost and telemetry work is boring in exactly the right way

Sign up for more like this.

`/code-review --fix` is a pre-review pass, not a rubber stamp