claude-code

Claude Code’s Security Plugin Turns AI Code Review Into an In-Session Feedback Loop

Anatoliy Kolodkin

27 May 2026 • 5 min read

Claude Code’s new security-guidance plugin is not interesting because it promises to make AI-written code safe. That would be the usual product-demo overclaim, and thankfully Anthropic’s own docs do not make it. The interesting part is more practical: Claude Code is starting to treat security review as something that happens inside the agent session, while the code is still warm, rather than after the agent has generated a tidy pull request and moved on to the next task.

That matters because AI coding changes the timing of review. A human developer usually writes slowly enough that feedback from tests, linters, colleagues, and muscle memory can interrupt bad ideas. An agent can edit five files, run shell commands, spawn subagents, and prepare a commit before anyone has mentally loaded the diff. If the first security signal arrives as a PR bot comment, the human reviewer is already doing janitorial work for generated code.

The plugin’s design is a useful correction. It adds three review layers to Claude Code: deterministic per-edit pattern checks, end-of-turn diff review, and a deeper commit or push review when Claude itself runs git commit or git push. The install path is straightforward — /plugin install security-guidance@claude-plugins-official, then /reload-plugins — with requirements of Claude Code CLI 2.1.144 or later, Python 3.8+, and a git repository for the diff and commit review layers. The per-edit checker works outside git because it is just looking at the edit stream.

The useful part is the cheap layer

The least glamorous layer may be the most immediately valuable. The per-edit checker does not call a model, which means it adds no model cost and should not become a budget argument. It looks for obvious dangerous patterns: eval(, new Function, os.system, child_process.exec, pickle, dangerouslySetInnerHTML, direct .innerHTML = assignments, document.write, and edits under .github/workflows/.

That is not sophisticated security analysis. Good. Sophisticated is expensive, slower, and easier to argue with. A coding agent should not need a model-backed seminar to be told that shell execution, deserialization, raw HTML injection, or workflow mutation deserves scrutiny. This is the same reason mature teams still run linters and secret scanners even when they also have senior reviewers: cheap deterministic guardrails catch the boring mistakes before humans waste attention on them.

The second layer is where Claude starts doing what regex cannot. At the end of a turn, the plugin computes a git diff for everything changed during that turn, including edits made through tools, Bash commands, and subagents. The docs say it can catch issues such as authorization bypass, IDOR, injection, SSRF, and weak cryptography, covering up to 30 changed files per turn and firing at most three times in a row before yielding. That limit is a small but important operational detail. Security review that loops forever is just denial-of-service with better intentions.

The third layer runs when Claude commits or pushes through its Bash tool. This reviewer can inspect surrounding code, callers, sanitizers, and related files to reduce false positives, and it is capped at 20 reviews per rolling hour. By default, model-backed reviews use Claude Opus 4.7, with SECURITY_REVIEW_MODEL controlling the end-of-turn reviewer and SG_AGENTIC_MODEL controlling the commit reviewer. Teams that want more paranoia can set SG_DUAL_OR=on to run two parallel review calls and union the findings, at roughly twice the API cost.

“Claude reviews Claude” is the wrong mental model

The lazy critique is that this is Claude grading its own homework. There is a real risk there, but it is not the whole design. Anthropic’s docs describe separate review calls with fresh context and security-focused prompts, plus a deterministic layer that does not depend on model judgment at all. That separation matters. The useful pattern is not self-approval; it is adversarial-ish review inside the workflow, with narrower instructions than the code-writing agent had.

Still, the plugin should not be treated as authority. The docs are explicit: it does not block writes or commits, and findings are fed back into the writing Claude as instructions. The review model can miss issues. The commit reviewer only triggers when Claude runs the git command through its Bash tool. Custom guidance is advisory. If a policy must block a merge, it belongs in CI, protected-branch rules, hooks, dependency scanning, secrets scanning, SAST, DAST, or human review — not in a polite note appended to an agent session.

That distinction is where teams will either use this well or turn it into theater. A security plugin inside Claude Code is excellent as an early-warning system. It is weak as a compliance boundary. The right question is not “Can we trust Claude to secure its own code?” The right question is “Can we reduce the number of obvious security defects that survive long enough to hit PR review?” On that question, Anthropic has a plausible claim: Help Net Security quoted Anthropic’s developer account saying internal rollout and benchmarks saw a 30–40% decrease in security-related PR comments for PRs opened using the plugin.

That number is worth treating as directional, not universal. A 30–40% reduction in PR comments could mean better code, fewer obvious mistakes, reviewer behavior changes, or a benchmark that maps imperfectly to your stack. But the category is correct. The earlier a tool catches an auth bypass, unsafe deserialization call, workflow permission change, or missing tenant filter, the cheaper the fix and the less likely the review conversation turns into archaeology.

Project-specific guidance is where this becomes useful

The plugin supports custom guidance from ~/.claude/claude-security-guidance.md, .claude/claude-security-guidance.md, or .claude/claude-security-guidance.local.md, with a combined 8 KB cap. It also loads up to 50 custom pattern rules from .claude/security-patterns.yaml, .yml, or .json, skipping regexes prone to catastrophic backtracking. That is the feature teams should spend time on, not the install command.

Generic security advice is table stakes. The valuable guidance is local: every query must include tenant_id; never log access tokens or full request bodies; do not touch .github/workflows/ without human approval; only use the approved SSRF-safe HTTP client; password reset tokens must be single-use and hashed at rest; cryptographic comparisons must use constant-time helpers; admin checks must use the central authorization middleware, not route-level improvisation. These are the rules a senior engineer remembers because they have seen the incident. Put them where the agent will see them before it creates the next diff.

There is a privacy footgun here too. The official plugin README says review data includes changed file paths, diff hunks, relevant file contents, and any extra files the agentic reviewer pulls through Read, Grep, or Glob. It also warns not to put secrets in guidance files. Treat those files as prompt material, not a vault. If your security guidance requires secret values to be useful, the problem is your process, not the plugin.

For practitioners, the rollout plan is boring in the best way. Enable the plugin in a non-critical repository first. Add five to ten project-specific rules based on real incidents and common review comments. Watch what it flags, what it misses, and whether it reduces repeated PR feedback. Use deterministic custom patterns for house rules that are easy to express. Use CI gates for anything that must block. Keep human review for trust-boundary changes: auth, payments, tenant isolation, workflow files, deserialization, crypto, infrastructure, and anything that grants a tool or agent new authority.

The bigger story is that AI-generated-code review is becoming part of the agent runtime control plane. Claude Code already has hooks, plugins, slash commands, usage accounting, /code-review --fix, and managed plugin surfaces. Security-guidance fits that direction: earlier feedback, narrower review context, visible cost controls, and project-specific policy hooks. This is what responsible agent adoption looks like when it stops being a demo and starts becoming infrastructure.

My take: install it, but do not canonize it. The plugin is valuable precisely because it lowers the latency of security feedback without pretending to replace the rest of the security program. If Claude is going to write production diffs, it should hear from a security reviewer before the PR exists. Then the humans, CI, scanners, and deployment controls still get their turn. That is not redundant. That is defense in depth with less cleanup.

Sources: Anthropic Claude Code security guidance docs, Help Net Security, anthropics/claude-plugins-official, Claude Code hooks docs, Claude Code code review docs

The useful part is the cheap layer

“Claude reviews Claude” is the wrong mental model

Project-specific guidance is where this becomes useful

Sign up for more like this.