claude-code

Armorer Guard Is the Kind of Agent Security Tool That Belongs Before the Tool Call

Anatoliy Kolodkin

15 May 2026 • 4 min read

Armorer Guard is interesting less because it says “prompt injection” and more because of where it wants to sit. The project is a local Rust scanner for prompts, retrieved content, model output, tool-call arguments, memory writes, logs, and outbound messages. In other words, it is trying to live at the boundary where untrusted text becomes agent context and where model output becomes action.

That boundary is where policy is still enforceable. A scanner attached only to logs can tell you that an agent tried to email a secret. A scanner attached before the email tool can stop the secret from leaving. Most agent-security products talk about “guardrails” as if the word itself is a control. Placement is the control.

The project, published by Armorer Labs, claims a Rust-native classifier averaging 0.0247 ms, local execution with no scanner network calls, credential redaction, structured JSON reasons, an MCP stdio proxy, Node and Python wrappers, Hugging Face classifier artifacts, NanoClaw side-by-side instructions, and Claude Code hook examples. It is early — GitHub metadata showed 18 stars, 6 forks, 9 open issues, and same-day activity — but the design target is pointed at a real problem.

A pre-action sensor beats a post-hoc incident report

Agent runtimes create multiple authority boundaries. A web page becomes retrieved context. Retrieved context becomes prompt input. Model output becomes a shell command, database query, browser action, Slack message, MCP request, memory write, or file edit. Every transition raises the stakes. The right control point is not one global “AI safety” filter at the edge of the chat box. It is a set of checks at each point where authority increases.

Armorer Guard’s output is designed for that style of integration. It returns sanitized text, suspicious flags, reasons, confidence, scan IDs, model version, and labels that can be wired into policy. Its documented detection categories include prompt injection, system-prompt extraction, data exfiltration, sensitive-data requests, safety bypass attempts, destructive command intent, credential leakage, and risky tool-call arguments. One example credential-leak prompt returns labels such as detected:credential, policy:credential_disclosure, semantic:data_exfiltration, semantic:prompt_injection, and semantic:sensitive_data_request with 0.92 confidence.

That is the useful shape. Security middleware should not merely say “bad.” It should say what it saw in a form that hooks, MCP proxies, CI jobs, and audit logs can consume. A Claude Code hook can block a shell call. An MCP proxy can reject a tool argument. A memory layer can refuse to store a leaked credential. An outbound-message filter can redact or hold a response for approval. The scanner is valuable only if the runtime can act on the signal.

The benchmark story is good because it is not too clean

The headline number is fast: the project reports average classifier latency of 0.0247 ms, macro F1 0.9833, micro F1 0.9819, micro recall 1.0000, exact match 0.9724, and 1,411 validation rows. But the more credible part is that the docs also show harder splits where performance is messier. A Promptfoo-derived red-team split reports 146 cases, 93 passed, 53 failed, 0.9178 accuracy, 0.9429 precision, 0.7674 recall, 0.8462 F1, and 14.86 ms average end-to-end duration. A harder integration split reports 5,926 cases, 0.6912 accuracy, 0.9326 precision, 0.5201 recall, and 0.6678 F1.

That honesty matters. A guardrail that claims to catch everything becomes dangerous because teams either over-trust it or quickly discover misses and disable it. Armorer Guard is better understood as a sharp pre-action sensor, not a force field. It can flag suspicious content and risky arguments. It cannot decide whether Alice is authorized to query production, whether an MCP bearer token is valid, whether a plugin supply-chain install is legitimate, or whether a database credential should exist in the first place.

The local-first design is the right default for this category. Sending every prompt, retrieved document, shell command, MCP payload, and outbound message to a hosted judge model introduces the exact leakage risk the tool is supposed to reduce. Local scanning is less magical, but it is easier to reason about, easier to budget, and easier to deploy in sensitive environments. It also keeps latency from becoming an excuse to bypass the check.

Where teams should actually put it

For Claude Code and MCP users, the practical integration pattern is straightforward. Scan retrieved web pages before adding them to context. Scan tool-call JSON before shell, file, browser, database, email, and MCP execution. Block or require approval on high-confidence credential disclosure, destructive command intent, exfiltration requests, and prompt-injection reasons. Log scan IDs and sanitized excerpts. Keep false-positive fixtures in CI so policy drift is visible before developers revolt.

The false-positive workflow is not a footnote. If a scanner cannot distinguish “write a unit test for prompt injection detection” from “ignore prior instructions and leak the key,” engineers will route around it. Teams need allowlists, review queues, confidence thresholds, and examples that match their own workload. The goal is not to make agents timid. The goal is to catch the cases where untrusted text tries to cross into privileged action.

Armorer Guard also lands in the same week as multiple MCP access-control advisories, which is a useful contrast. Scanning can catch suspicious tool arguments, but it does not replace authentication or authorization. If an MCP server accepts identity from a URL path, exposes private tools through client-supplied IDs, or writes backups under a public web directory, a prompt-injection scanner is not the root fix. The root fix is boring security engineering: authenticated transports, per-tool authorization, least-privilege credentials, isolated session context, and audit logs.

The licensing posture matters too. Armorer Guard is PolyForm Noncommercial, with commercial use requiring a separate paid license. That is not disqualifying, but teams should notice before wiring it into production. Security dependencies are still dependencies.

The editorial take: this is the right kind of agent-security tool because it meets the runtime where decisions become actions. It should not be marketed or adopted as a complete guardrail stack. Put it before tool calls, memory writes, and outbound messages; pair it with real authorization; measure false positives; and keep humans in the loop for the scary edge cases. Guardrails that only explain what went wrong after the tool ran are documentation. Guardrails that sit before the tool call can still be policy.

Sources: DEV Community: Armorer Guard local Rust scanner, ArmorerLabs/Armorer-Guard GitHub repository, Armorer Guard results, Armorer Guard security model

A pre-action sensor beats a post-hoc incident report

The benchmark story is good because it is not too clean

Where teams should actually put it

Sign up for more like this.