claude-code

HOL Guard Turns Agent Security From ‘Trust Me’ Into a Local Preflight Gate

Anatoliy Kolodkin

01 Jun 2026 • 4 min read

Agent security is quietly moving out of the prompt box and into the preflight checklist. That is the right direction. Once a coding agent can install packages, rewrite config, trust MCP servers, run shell commands, and fetch arbitrary web content, “the model asked for approval” is no longer a security model. It is a modal dialog with optimistic branding.

That is why HOL Guard is worth paying attention to. The project shipped v2.0.408 on June 1, with the repository pushed at the same timestamp and a same-day commit that “allow[s] bounded secret term searches” for Codex. The patch note is small. The pattern is not: a local guard layer is emerging because agent harnesses are becoming too powerful to secure only with trust prompts and good intentions.

HOL Guard positions itself as “AI antivirus” for coding-agent environments. Strip away the pitch and the architecture is more interesting: it tries to detect local harness configuration, record a baseline before trust, pause on new or changed artifacts before launch, queue blocked changes in a localhost approval center, and store receipts for later review. It also ships a companion plugin-scanner for CI checks on plugins, skills, MCP servers, and marketplace packages.

That scope matters because the attack surface is no longer one tool. The README lists Claude Code, Codex, Copilot CLI, Cursor, Gemini, OpenCode, Hermes, plugins, skills, MCP servers, and marketplace packages. In other words: the modern agent stack is a pile of local config, remote packages, subprocesses, model-side instructions, tool schemas, and human approval loops. If that sounds like a supply chain, congratulations, you have read the diff correctly.

The approval prompt is not a sandbox

The industry keeps wanting approval prompts to do more work than they can. They are useful. They force a pause before the agent executes a command or edits a file. But dangerous actions rarely arrive wearing a cartoon villain badge. They look like ordinary development chores: add this MCP server, install this package, fetch this documentation page, update this hook, search for a token-like string, or trust this skill bundle.

HOL Guard’s recent fixes are exactly the kind of boring work that decides whether a security tool survives real use. v2.0.407 tightened an HTTP fetch false-positive classifier. v2.0.408 allowed bounded secret-term searches. Those are not marketing bullets; they are workflow survival patches. A guard that blocks every search for token teaches developers to bypass it. A guard that shrugs at unbounded secret scraping is decorative. The hard product problem is distinguishing “I am finding references to a secret key name so I can remove it” from “I am assembling an exfiltration payload.”

The project’s protection levels make that tradeoff explicit: Gentle, Balanced, Strict, and Paranoid. That is better than pretending one default policy fits a toy repo, a bank’s internal monorepo, and a weekend Obsidian vault. The stricter modes can block unrecognized MCP server actions; the gentler modes focus on high-confidence secrets and exfiltration patterns. Good security tooling gives teams a dial, not a sermon.

The more important idea is baselining. If a developer has already reviewed a known set of Claude Code hooks, Codex plugins, MCP servers, and skills, then a new or changed artifact deserves a stop sign before launch. That is the same operational instinct behind dependency lockfiles and CI policy checks. The agent ecosystem needs fewer vibes and more receipts.

Agent packages need CI, not folklore

The plugin-scanner side of HOL Guard may end up being as important as the local runtime gate. The scanner exposes a GitHub Action with controls such as fail_on_severity: high and min_score: 80, emits SARIF and verification payloads, and tracks provenance for skills, MCP config, and Codex plugins. That is the right shape: make agent packages reviewable in the same places teams already review code.

This is where practitioners should be blunt with themselves. If your team lets Claude Code install or consume skills, MCP servers, or plugin bundles from external repositories, those artifacts belong in code review. A SKILL.md file can shape tool choice and process. An MCP server can expose new capabilities. Hooks can alter lifecycle behavior. Marketplace convenience is still dependency intake.

HOL Guard had 350 stars, 5 forks, and 1 open issue at research time — not a giant project, but large enough to show that developers recognize the gap. The more convincing signal is the surrounding anxiety. Recent security discussions have focused on agents following plausible fetched instructions into package installs or file access. That is not hypothetical paranoia. It is the predictable failure mode of giving a model a browser, a shell, a package manager, and a trusting human.

For engineering teams, the action item is not “install HOL Guard and declare victory.” The action item is to decide where a gate belongs. You probably need checks before harness launch, before MCP server trust, before plugin or skill release, before config writes, and before network egress from suspicious tool calls. You need local receipts for what was approved. You need CI scanning for agent packages. You need policy that says which agent surfaces can touch company repos, credentials, and production-adjacent scripts.

Solo developers should care too. Keep the guard local. Do not expose approval centers beyond loopback. Treat receipts as debugging artifacts, not just compliance theater. Use stricter modes on repos with secrets. And if a tool blocks something that feels annoying, inspect why before turning the whole thing off. Security friction is bad; invisible agent authority is worse.

The larger read is simple: agent security is moving below prompts and above the OS. That middle layer — local baselines, approval receipts, package scanning, MCP policy, and CI gates — is where serious adoption will either harden or get weird. HOL Guard is one implementation. The requirement is broader than the repo.

Sources: HOL Guard GitHub repository, HOL Guard v2.0.408 release, Claude Code MCP security docs, Reddit security discussion

The approval prompt is not a sandbox

Agent packages need CI, not folklore

Sign up for more like this.