The Attack Surface Nobody Audits: AI Coding Agents and the Third Layer of Your Supply Chain

The Attack Surface Nobody Audits: AI Coding Agents and the Third Layer of Your Supply Chain

The Attack Surface Nobody Audits: AI Coding Agents and the Third Layer of Your Supply Chain

There is a layer of your AI coding stack that nobody is auditing. Not the model — everyone watches the model. Not the API infrastructure — that's logged, monitored, token-scoped. The layer nobody is watching is the integration plane: the SKILL.md file that extends your agent's capabilities, the MCP server configuration that gives it access to your tools, the plugin registry where anyone with a GitHub account can publish a new skill in minutes. None of these look like code. All of them execute like code. And between February and May 2026, a cluster of research documented exactly how badly that gap matters.

The findings are not speculative. Snyk audited 3,984 agent skills from public registries and found that 1,467 — more than a third — contained at least one security flaw. Seventy-six contained confirmed malicious payloads: credential theft, reverse shells, data exfiltration. These were not misconfigured permissions or accidental exposures. They were deliberate payloads embedded in what looked like documentation files. The ClawHavoc campaign, first documented by Koi Security in January 2026, eventually expanded to 1,184 compromised packages after Antiy CERT finished the count — professional-grade malware distributed through skill definitions with names like solana-wallet-tracker and polymarket-tracker, crafted to match what developers actually search for.

The DDIPE paper — Document-Driven Implicit Payload Execution — published by researchers at Griffith University, NTU, UNSW, and the University of Tokyo in April 2026, provides the technical mechanism that makes this class of attack work at scale. The technique achieved bypass rates between 11.6% and 33.5% across four agent frameworks and five large language models. Most samples were caught by static analysis. But 2.5% evaded all four detection layers simultaneously. That number sounds small until you multiply it by the number of agent skills being published daily, which jumped from under 50 in mid-January to over 500 by early February — the direct result of a low barrier to entry and a suddenly valuable attack surface.

The authorization plane is flatter than you think

The most important finding in the research is not the vulnerability count. It is the structural insight about how compromised skill definitions move inside a running agent session. Carter Rees, VP of AI at Reputation, put it plainly: the flat authorization plane of an LLM fails to respect user permissions. A compromised SKILL.md file running inside a session that already has read access to a repo and write access to an npm package does not need to escalate privileges. It is already operating at the authorization level of the agent itself. That is a fundamentally different threat model than traditional supply-chain attacks on compiled code, where the attacker needs to find a path from the compromised dependency to code execution. Here, the payload inherits whatever the agent already has.

On Cursor specifically — CVE-2026-22708, CVSS 7.0 — Pillar Security demonstrated that implicitly trusted shell built-in commands including export and typeset could be poisoned through indirect prompt injection. The attack converted benign developer commands into arbitrary code execution vectors. Users saw only the final command. The poisoning happened through other commands the IDE never surfaced for approval. The agent presented a clean output. The attack happened in the gap between what the agent routed and what the user approved.

In a documented in-the-wild case from April 2026, a crafted GitHub issue title triggered an AI triage bot wired into Cline. The bot exfiltrated a GITHUB_TOKEN. The attacker used it to publish a compromised npm dependency that installed a second agent on approximately 4,000 developer machines for eight hours. One issue title. Eight hours of access. No human approved any of it.

MCP is the accelerant nobody planned for

The Model Context Protocol was designed to solve a real problem: connecting AI agents to the tools and data they need in a standardized way. Anthropic donated MCP to the Linux Foundation in December 2025. That is the right institutional move for a protocol intended to become a standard. What nobody planned for was how fast the attack surface would scale once the protocol had a marketplace.

OX Security reported in April 2026 that researchers poisoned nine out of eleven MCP marketplaces using proof-of-concept servers. The vulnerability class affects Anthropic's MCP SDK across Python, TypeScript, Java, and Rust. Trend Micro found 492 MCP servers exposed to the internet with zero authentication; by April that number had grown to 1,467. OX Security estimates the ripple at 150 million-plus downloads, 7,000-plus publicly accessible servers, and up to 200,000 vulnerable instances in total.

The root issue, as The Register reported, lies in Anthropic's MCP SDK transport mechanism. Anthropic has characterized some of the reported behavior as a design feature rather than a defect — which is a defensible position for certain aspects of the protocol, and a concerning one for the authentication gap that lets anyone spin up an unauthenticated MCP server on the public internet. These two things can both be true simultaneously, which is exactly what makes the governance problem hard.

CLI-Anything, a tool from the University of Hong Kong's Data Intelligence Lab that analyzes any repo and generates a structured CLI that AI coding agents can operate with a single command, launched in March 2026 and has accumulated over 30,000 GitHub stars. It supports Claude Code, Codex, OpenClaw, Cursor, and GitHub Copilot CLI. The same mechanism that makes software agent-native opens the door to agent-level poisoning, because CLI-Anything generates SKILL.md files — the same artifact type that Snyk found laced with malicious payloads. Thirty thousand GitHub stars means thirty thousand repos being converted into agent-readable formats that could carry payloads.

The detection gap is finally closing — slowly

Cisco's security team confirmed the structural gap directly: "Traditional application security tools were not designed for this. SAST scanners analyze source code syntax. SCA tools check dependency versions. Neither understands the semantic layer where MCP tool descriptions, agent prompts, and skill definitions operate." Merritt Baer, CSO of Enkrypt AI and former Deputy CISO at AWS, told VentureBeat the situation is "very similar to early container security, but we're still in the 'we'll get to it' phase across most orgs."

The first purpose-built detection tools shipped in April 2026: Cisco's open-source Skill Scanner and Snyk's mcp-scan. The window between the vulnerability being documented — February through April 2026 — and the first detection tools shipping — April 2026 — is the pre-exploitation window security teams are racing to close. It is also a useful reminder that vulnerability documentation and vulnerability remediation are not the same event, and the gap between them is where real exposure lives.

For developers and security teams, the practical takeaway is that the model is not the attack surface. The infrastructure around the model is. SKILL.md files, MCP configurations, plugin registries, and natural-language instruction sets are how developers extend and customize AI coding agents, and none of them have existed long enough for a security category to mature around them. The result is that the average development team has no idea what their agent is actually executing when it ingests a new skill or connects to an MCP server.

The kill chain is worth understanding precisely because each step is individually invisible. A SKILL.md file submitted to an open-source project looks like documentation. A code reviewer waves it through. The agent ingests it because no verification layer exists. The agent executes embedded instructions using its own legitimate credentials. EDR sees an approved API call from an authorized process and passes it. The exfiltration or configuration change happens through channels the monitoring stack considers normal. No single security tool in the average enterprise stack catches this today.

What to actually do about it

The good news is that the attack surface is auditable, even if it is not yet well-audited. Treat skill definition files as untrusted executable intent, even when they are just text. Audit every SKILL.md and MCP configuration the same way you audit a new npm package. Instrument your runtime — know what data your agent is accessing and what actions it is taking. The arrival of Cisco's Skill Scanner and Snyk's mcp-scan is the starting gun, not the finish line.

The harder question is what this means for the broader trajectory of AI coding tools. The research does not say "don't use these tools." It says "the trust model you are using — install a skill, get new capabilities — has a gap where traditional security tooling does not look. Close that gap before you scale." The teams that figure out how to audit, isolate, and observe their agent integration layers will be the ones who get to keep the productivity gains without the corresponding exposure.

The attack surface is real. The documentation is solid. The detection tooling is arriving. What is still missing in most organizations is the institutional acknowledgment that the third layer — the integration plane between the model and the tools — needs its own security category, its own review process, and its own monitoring discipline. That acknowledgment is the hardest part, because it requires treating as a threat something that looks like a productivity feature. That is always where security gets hard.

Sources: VentureBeat, Snyk ToxicSkills report, OX Security MCP vulnerability report, Pillar Security CVE-2026-22708 analysis, DDIPE academic paper (Griffith/NTU/UNSW/University of Tokyo)