Microsoft’s Open Trust Stack Is an Attempt to Make Agent Guardrails Portable Instead of Prompt-Shaped

Microsoft’s Open Trust Stack Is an Attempt to Make Agent Guardrails Portable Instead of Prompt-Shaped

The least convincing agent safety plan is still the most common one: tell the model not to do the bad thing and hope the stochastic parrot respects the policy under pressure. Microsoft’s Build 2026 “open trust stack” is worth paying attention to because it points in a better direction. Guardrails should be executable, auditable, portable, and enforced at runtime — not buried inside a prompt that becomes performance art the moment tools, memory, and delegation enter the system.

The stack Microsoft outlined has several moving parts: ASSERT for policy-driven eval generation, Agent Control Specification for portable runtime controls, Guided Guardrail Setup in Foundry, Rubric evaluator, multi-turn evals, trace replay, production-trace sampling, agent ROI, and Purview-backed data-loss prevention for Foundry agent interactions. That list is long because the problem is long. “Agent safety” is not one feature. It is a feedback loop: define intended behavior, test it, enforce deterministic controls, observe production behavior, learn from traces, and repeat.

ASSERT is the eval side of the loop. Microsoft describes it as requirements-driven, safety-focused, open source, and compatible across LangChain, CrewAI, LiteLLM, OpenAI, and other stacks. The promise is practical: teams already have requirements written somewhere — policy docs, system prompts, compliance notes, support manuals, escalation rules — but those requirements usually do not exist as executable tests. ASSERT tries to convert written intent into inspectable cases, scorecards, and failure rationales.

The validation numbers are useful if read with adult supervision. Microsoft says ASSERT validation covered social scoring, sycophancy, task adherence, tool-use governance, and unsafe health guidance. It reports roughly 1.2x more intended-behavior coverage than an in-house baseline, 1.5x more inspectable cases, 4x stronger separation between stronger and weaker systems, about half as many saturated cases, and roughly 2x as many distinct failure patterns. Judge-to-human agreement was typically 80–90%, with human inter-annotator agreement around 90%.

That is good enough to make eval work cheaper and more systematic. It is not good enough to outsource judgment in high-risk domains. The win is not “the judge is perfect.” The win is that behavior-specific evaluation becomes cheap enough that teams may actually run it before changing prompts, tools, models, or policies. Most organizations do not fail because they lack a theoretical safety framework. They fail because testing agent behavior is tedious, so it becomes optional, and optional controls disappear under delivery pressure.

ACS gives policy a vote before tools touch the world

The Agent Control Specification is the enforcement side. Microsoft describes ACS as “the MCP or A2A of agent safety,” meaning a portable interface rather than a single framework’s callback trick. The Foundry post says ACS defines five validation checkpoints covering input, LLM, state, tool execution, and output. Microsoft’s deeper ACS write-up names eight interception points in the current framing: agent_startup, input, pre_model_call, post_model_call, pre_tool_call, post_tool_call, output, and agent_shutdown.

That five-versus-eight distinction is not a scandal; launch posts compress specs. But it is a reminder that builders should implement from the spec and repo, not from a keynote paragraph. If ACS becomes part of your compliance story, the exact version matters. Which intervention points are enforced? What is the default when a policy provider fails? Are verdicts fail-closed or fail-open? Which annotations feed decisions? Where do verdicts land in traces? Who can modify the manifest?

The ACS model is compelling because it standardizes where policy gets a vote. Before the model sees input. Before a tool executes. After a tool returns data. Before output reaches the user. At shutdown for audit. These are the points where agent failures become real. A prompt injection hidden in retrieved content is not the same as a model hallucinating a refund rule. A tool call attempting to email an external recipient after confidential context entered the session is different again. Traditional IAM asks whether a credential can call an API. ACS is trying to ask whether this agent, with this accumulated context, should perform this action now.

Microsoft’s example uses one manifest and Rego policy to deny external email recipients across Python and Node hosts, with .NET and Rust conformance fixtures asserting identical verdicts. That portability is the interesting part. Agent stacks are currently full of bespoke safety gates: a LangChain callback here, a custom middleware layer there, an SDK-specific wrapper somewhere else. Those controls are hard to audit across teams, hard to migrate, and easy to bypass when a new framework enters through a side door. A portable policy contract makes the control plane less dependent on the agent framework fashion cycle.

Prompt safety is not an enforcement boundary

The surrounding Agent Governance Toolkit material puts the problem crisply: prompt-level safety is “a polite request to a stochastic system.” That line should be printed on the first page of every agent architecture review. Prompts can express intent. They cannot reliably enforce authority. If the model can still emit a dangerous tool call and the runtime blindly executes it, the control failed before security got a vote.

ACS and ASSERT matter because they separate three jobs that too often get smashed into one prompt: specify desired behavior, measure whether the system follows it, and enforce hard boundaries when it does not. Evals generate evidence. Policy gates make decisions. Traces preserve the record. That decomposition is how mature engineering systems work. We do not secure web apps by asking the request handler politely not to access other users’ data; we use authentication, authorization, input validation, logs, tests, and incident response. Agents deserve the same seriousness because they can now take actions, not merely generate text.

Microsoft’s Foundry updates around observability and evaluation reinforce the point. Rubric evaluator, tracing and evals for any framework, AZD observability, multi-turn evaluation, User Simulation, intelligent sampling over production traces, traces-to-dataset, trace replay and visualization, and Agent Optimizer private preview are all pieces of the same operating model. Production traces should become evaluation datasets. Evaluation failures should become policies or tests. Policy verdicts should be visible in traces. Optimizer changes should be re-tested before deployment. If that sounds like CI/CD, yes. That is the point.

Runtime DLP in Foundry is now in public preview, and Purview insights in the Foundry Control Plane are generally available. DLP is a particularly useful test of whether agent governance is real. It is one thing to filter obvious toxic output. It is another to decide whether an agent that has seen sensitive business context may call a tool, summarize a document, send an email, or hand off to another agent. The policy needs context, labels, auditability, and a place to stop execution before the damage leaves the boundary.

The practical next step for teams is not to buy a guardrail product and declare victory. Start by mapping the agent lifecycle. List inputs, model calls, retrieval sources, state stores, tools, output channels, delegation paths, data labels, approval boundaries, and audit requirements. Then decide which risks belong to evals, deterministic policy, DLP, sandboxing, human approval, IAM, content filters, or budget controls. If you cannot say where policy gets a vote before a tool acts, your agent is not governed. It is supervised by optimism.

Microsoft’s open trust stack is early, partner-heavy, and likely to evolve. Good. Standards and specs are supposed to be argued into shape. The important shift is architectural: agent safety is moving out of prompt folklore and into runtime contracts. That is the right direction. The industry does not need more decorative guardrails. It needs controls that fire when the model is wrong, the tool is powerful, and the user is not watching.

Sources: Microsoft Foundry Dev Blog, Microsoft Command Line — Agent Control Specification, Microsoft Command Line — ASSERT, Microsoft Agent Governance Toolkit, ASSERT on GitHub