ai-frameworks

Microsoft Agent Framework’s 1.1.1 Patch Says the Real Framework Battle Is Happening in Runtime Boundaries

Anatoliy Kolodkin

23 Apr 2026 • 4 min read

Patch releases are usually where framework companies hide the awkward truth. The launch blog gets the architecture diagram. The patch gets the bug that explains whether the architecture survives contact with reality. Microsoft Agent Framework 1.1.1 is that kind of patch. On paper, it is a modest update to the Python package. In practice, it reads like a map of where agent frameworks are actually winning or losing in 2026: at the seams between UI protocols, workflow runtimes, tool boundaries, and sandbox execution.

The headline additions in Agent Framework 1.1.1 are not glamorous. Microsoft added ground-truth expected_output support to evaluate_workflow for similarity evaluators. It added a SKIP_PARSING sentinel so FunctionTool.invoke can return raw function results without forcing them through the framework’s content-wrapping path. It propagated thread_id and forwarded_props from AG-UI through to A2A context. It also fixed a Hyperlight sandbox threading issue that could trigger a pyo3_runtime.PanicException when a WASM sandbox was touched from the wrong OS thread. None of that sounds like keynote material. All of it sounds like production.

That matters because Microsoft is not really selling Agent Framework as just another SDK. The company’s documentation keeps expanding the frame: agents, tools, conversations, memory and persistence, workflows, hosting, DevUI, A2A, migration paths from both AutoGen and Semantic Kernel. That is a runtime thesis, not a library thesis. If that thesis is going to hold, context has to survive boundary crossings without getting mangled.

The AG-UI to A2A handoff is more important than the version number

The most revealing change in 1.1.1 may be the AG-UI and A2A propagation work. Microsoft’s patch explicitly pushes thread_id and forwarded_props through AG-UI into A2A context. In plain English, that means session identity and important metadata are less likely to disappear when a workflow moves from a frontend protocol into a cross-agent transport layer.

This sounds small until you remember what real agent systems look like. They are not one loop talking to one model in one terminal. They are UI sessions, approval prompts, agent-to-agent delegation, long-running state, and sometimes hosted execution on the other side. The moment thread identity gets dropped, operators lose traceability. Frontends lose continuity. Humans reviewing an approval step have less context than they think they do. Debugging turns into archeology.

A lot of “multi-agent” marketing still behaves as if handoff is a conceptual problem. It is not. The industry has plenty of handoff metaphors. The hard part is preserving state through the transport layers that connect those metaphors to real interfaces. Microsoft fixing this path is a sign that the framework team understands where the operational bill comes due.

Raw tool outputs are a control-surface issue, not a convenience tweak

The SKIP_PARSING addition is similarly revealing. Frameworks love to normalize everything into one friendly content abstraction because it makes demos look elegant. But when a tool returns a raw result that should stay raw, forcing it through a wrapper and reparsing it later is not elegance. It is information loss with extra steps.

Microsoft’s own release notes tie this change to Hyperlight host callbacks and public API cleanup. That is the right place to focus. Once frameworks start wrapping tool outputs too aggressively, you get the worst of both worlds: a runtime that promises structure and a tool layer that quietly degrades it. For enterprise teams, raw result fidelity is not an implementation detail. It is how approvals, auditability, and downstream workflow logic remain trustworthy.

This also puts Agent Framework in the broader 2026 context. OpenAI’s Agents SDK has been thickening its sandbox and resume semantics. LangChain keeps paying down core output-shape and tracing debt. PydanticAI is treating prompt ownership and typed boundaries as real surfaces. The best framework teams are all converging on the same uncomfortable lesson: runtime boundaries matter more than another orchestration slogan.

The sandbox panic fix says Microsoft is serious about hosted execution

The Hyperlight threading fix deserves more attention than it will get. According to the release notes, Microsoft now thread-confines WasmSandbox interactions using a per-entry ThreadPoolExecutor to eliminate an unsendable PyO3 panic from asyncio worker threads. There is a lot of jargon packed into that sentence, but the point is simple. A sandbox that crashes because the wrong thread touched it is not a sandbox you can quietly trust in production.

Hosted execution is where frameworks stop being developer toys and start acting like platforms. As soon as code runs inside isolated runtimes, with tools, files, and approval steps nearby, thread safety and transport correctness become product features. Microsoft clearly wants Agent Framework to sit in that layer, especially with its growing stack of hosting, workflow, and integration surfaces. Fixes like this are the tax you pay for that ambition.

There is also a strategic signal here. Microsoft has already merged AutoGen and Semantic Kernel into one broader Agent Framework story. That merger only works if the resulting platform can own more of the runtime surface than either predecessor did alone. A patch full of boundary and sandbox work suggests the company is still moving in exactly that direction.

What practitioners should do next

If you are evaluating or already running Microsoft Agent Framework, this release is a reminder to test the parts of your system that demos skip. First, validate thread continuity across AG-UI, workflow state, and any A2A paths you use. Make sure session identifiers survive the trip and remain visible in your traces. Second, inspect any place where tool results are wrapped, converted, or reparsed. If you depend on structured outputs for approvals or routing, this is worth explicit regression coverage. Third, if you use Hyperlight or any hosted execution surface, test sandbox interactions under concurrent load rather than just happy-path local runs.

More broadly, stop treating patch releases as housekeeping. In agent infrastructure, patch releases are where maintainers reveal what is breaking at scale. That is often more useful than a roadmap deck.

My read is straightforward. Microsoft Agent Framework 1.1.1 is not interesting because it shipped a few fixes. It is interesting because the fixes all point at the same thesis: the framework is increasingly competing on whether context survives handoffs, tool boundaries, and hosted runtimes without being silently degraded. That is a much better place to compete than on who can draw the prettiest box-and-arrow diagram.

Sources: Microsoft Agent Framework 1.1.1 release notes, Microsoft Agent Framework documentation

The AG-UI to A2A handoff is more important than the version number

Raw tool outputs are a control-surface issue, not a convenience tweak

The sandbox panic fix says Microsoft is serious about hosted execution

What practitioners should do next

Sign up for more like this.