Microsoft’s Agentic AI Failure Taxonomy Is a Production Checklist, Not a PDF to File Away
Microsoft’s latest agentic AI failure taxonomy is useful because it refuses to pretend the problem is the model. After a year of red-team work against deployed agent systems, the company is describing the failure surface practitioners actually see: plugins, MCP servers, persistent memory, approval prompts, visual computer-use interfaces, delegated agents, tool descriptions, and the quiet supply chain of natural-language instructions that now sits next to code.
That framing matters. The industry still talks about agent security as if “prompt injection” is the bug and everything else is downstream cleanup. Microsoft’s update says the more accurate model is systems security with an LLM in the loop. Once an agent can call tools, read memory, delegate work, browse screens, or approve actions, the prompt is no longer just text. It is a control surface.
The new categories sound like incident reports, not taxonomy theater
The v2.0 taxonomy adds seven failure-mode categories after twelve months of red-team engagements: Agentic Supply Chain Compromise, Goal Hijacking, Inter-Agent Trust Escalation, Computer Use Agent Visual Attack, Session Context Contamination, MCP/Plugin Abuse, and Capability/Architecture Disclosure. That list is refreshingly concrete. These are not abstract “AI might be bad” risks; they map directly to how modern coding agents and orchestration frameworks are being deployed.
Agentic supply chain compromise is the right place to start. A modern agent stack can ingest tool manifests, MCP server descriptions, skill files, prompt templates, memory entries, browser state, plugin metadata, and delegated messages from other agents. Some of those artifacts are markdown. Some are JSON. Some look like documentation. Operationally, they are closer to dependencies than docs, because they can change behavior without changing application code.
That should change how teams audit agent systems. A software bill of materials that stops at npm, PyPI, container images, and OS packages is incomplete for an agent runtime. Microsoft explicitly calls for SBOM-style inventory across plugins, MCP servers, prompt templates, tool descriptions, and related runtime components. The uncomfortable implication: your “configuration” may now need provenance, version pinning, review history, and rollback procedures.
MCP and plugin abuse land in the same bucket. MCP’s pitch is interoperability: expose tools and context through a common protocol. That is useful. It is also a way to make tool descriptions, remote servers, and resource reads part of the agent’s trusted execution environment. A malicious or sloppy MCP server does not need to exploit a kernel bug to cause damage; it can advertise persuasive instructions, expose over-broad tools, leak context through resource names, or launder a dangerous operation through an innocent-looking capability.
Human approval is not a security boundary by default
The strongest line in Microsoft’s post is its treatment of human-in-the-loop bypass. Microsoft says this was among the most consistently exploited failure modes, including consent fatigue, probabilistic invocation manipulation, incremental escalation, and zero-click chains that begin from external input after the initial agent invocation. That should sting, because many agent products still treat the approval dialog as the magic safety layer.
Approval UX usually fails because it summarizes the agent’s story, not the underlying action. “Update the issue” sounds harmless. “POST this 4,000-character body containing private stack traces to a public GitHub issue” is a different decision. “Run tests” sounds safe. “Execute this shell command in a repo with production credentials in the environment” is not the same action. If the approval prompt is written from the agent’s natural-language rationale instead of structured tool metadata, the user is approving a narrative.
The fix is not “ask more often.” That creates fatigue and trains users to click through. The fix is to tier approvals by blast radius and build prompts from facts: tool name, exact target, data leaving the system, files modified, permissions used, reversibility, and whether the action was triggered by trusted or untrusted context. For coding agents, approval transcripts should be test artifacts. A team should be able to replay a run and answer: what did the user approve, what did the agent actually execute, and did the prompt disclose the risky part?
This is where Microsoft’s taxonomy connects cleanly with OWASP’s GenAI guidance around excessive agency, insecure plugin design, prompt injection, supply-chain vulnerabilities, and sensitive-information disclosure. The agent-specific twist is composition. One approval may be harmless. Ten small approvals chained through poisoned memory, a compromised MCP resource, and a delegated subagent can become exfiltration. Security review has to evaluate trajectories, not just individual tool calls.
Multi-agent systems need identity, not role-play
Inter-agent trust escalation is another category that deserves more attention from framework builders. Multi-agent demos often assign roles in prompts: planner, reviewer, coder, security analyst. That is not identity. If an orchestrator grants privileges because a subagent says “I am the security reviewer,” the system has recreated a confused-deputy problem in markdown.
Production orchestration needs scoped capabilities at the boundary between agents. A planner does not automatically get write access because it delegated to an executor. A reviewer does not become trusted because its system prompt says “reviewer.” A browser-use agent should not be allowed to pass visual instructions into a shell-capable coding agent without context provenance. The boring controls are the important ones: signed or otherwise verifiable agent identity, explicit capability grants, least-privilege handoffs, logs that preserve which agent requested which action, and policies that distinguish trusted system context from user, web, tool, and peer-agent context.
Session context contamination and memory poisoning make this worse over time. A one-time injected instruction can become persistent if the agent stores it as a user preference, project note, or memory summary. After that, every future run starts compromised but looks normal. Teams building memory into agents should track provenance and trust level for memory entries, make memory reviewable, and avoid mixing external content with durable behavioral instructions. “Remember this” is not a harmless feature when the thing being remembered can steer tools.
For practitioners, the immediate checklist is straightforward. Inventory every tool, MCP server, plugin, skill, prompt template, and memory source your agent can read. Pin versions where possible. Separate trusted configuration from untrusted runtime context. Build approvals from structured action metadata, not model prose. Log tool exposure and skill loading. Treat computer-use agents as UI attack surfaces. Test whether a poisoned web page, issue comment, email, or screenshot can alter future behavior. If the system cannot answer where an instruction came from, it should not treat that instruction as authority.
Microsoft’s taxonomy is not valuable because it names seven new categories. It is valuable because it pushes agent teams toward a more adult threat model. The model may be probabilistic, but the controls around it do not get to be vibes. If your agent stack cannot inventory capabilities, verify provenance, isolate untrusted context, and explain approvals from real tool calls, it is not production-ready. It is a prompt injection lab with a roadmap.
Sources: Microsoft Security, Microsoft taxonomy v2.0 PDF, OWASP GenAI Security Project