ai-frameworks

Google ADK 2.1.0 Makes Sandboxes, MCP Failures, and Tool Schemas the Real Agent-Runtime Story

Anatoliy Kolodkin

23 May 2026 • 5 min read

Google’s ADK 2.1.0 release is not the kind of framework update that demos well on stage. Good. The post-I/O agent stack does not need another abstraction diagram; it needs fewer ways for production agents to misunderstand their own runtime. This release is about the things that decide whether an agent platform is operable after the first impressive demo: which sandbox it booted, whether an MCP failure kills the session, whether a typed tool call arrives as the type the developer declared, and whether live-model metadata survives long enough to be audited.

That sounds unglamorous because runtime work usually does. But in the current Google agent story — ADK, Gemini, Managed Agents, Agent Runtime, Agent Platform, AI Studio, and the Interactions API — these are not edge cases. They are the product surface.

Google published adk-python v2.1.0 on May 23, with five features and fifteen bug fixes. The headline items are sandbox creation from templates and snapshots, better telemetry identity, live transcription and grounding fixes for Gemini 3.1, chart generation in the data-agent sample, and a set of tool/runtime fixes that should matter to anyone building with MCP or Pydantic-heavy tools. A companion v1.34.1 backports several operational fixes, including grounding metadata handling, transcription-finished events, and MCP session-drop prevention.

Sandbox provenance is becoming part of the API contract

The most strategically important change is sandbox creation from named templates and snapshots. The implementation adds AgentEngineSandboxComputer support for configuring a sandbox from either VMAAS_SANDBOX_TEMPLATE_NAME or VMAAS_SANDBOX_SNAPSHOT_NAME. In plain English: an agent run can now be tied to a known environment shape or a known captured state, rather than a vague “start a sandbox” instruction.

That distinction matters because coding agents and managed agents are environment-sensitive. A run may depend on a toolchain version, a repository checkout, a browser profile, a seeded database, fixture files, or a deliberately restricted credentials boundary. If the environment is implicit, reproducibility becomes a guessing game. If the environment is named and logged, the sandbox becomes something closer to infrastructure: versioned, reviewable, and debuggable.

This is also where ADK connects to Google’s broader Managed Agents push. Google’s Managed Agents announcement described Antigravity-backed agents running inside isolated, ephemeral Linux environments with code execution, browsing, file management, and server-side state. Once that model becomes a cloud primitive, sandbox provenance stops being an implementation detail. It becomes part of the security model. Teams need to answer: which template did this agent use, who approved it, what network rules were attached, what secrets were available, and can we reproduce the run?

Engineers should treat the new template/snapshot support as a prompt to update their own run metadata. Log the sandbox template or snapshot identifier beside the model, user, tool set, repo SHA, and approval state. If your incident review cannot reconstruct the environment, you do not have an agent audit trail. You have a transcript with vibes.

MCP failures should fail as tools, not as sessions

The MCP error-handling fix is the kind of operational hardening that only looks small until it is your live session dying with WebSocket 1006 abnormal closure. The relevant change enables graceful MCP errors by default and retrieves background session-context task exceptions so unhandled AnyIO TaskGroup transport failures do not bubble through the event loop and terminate streaming sessions.

This is the correct failure boundary. MCP tools are distributed systems components. They will crash, hang, return malformed data, fail auth, lose pipes, or throw background exceptions. When that happens, the agent should see a tool failure it can reason about: retry, ask for help, switch tools, or continue with degraded capability. The user should not see “the agent disappeared.”

That difference is especially important for live, multi-turn agent products. If a single tool server exception drops the full session, operators lose the surrounding conversation, partial plan, in-flight context, and often the clean error signal. It is worse UX and worse observability. The right abstraction is not “MCP always works.” The right abstraction is “tool failure is contained at the tool boundary.” ADK 2.1.0 moves closer to that.

Practitioners should test this deliberately. Run an MCP server that exits mid-call. Return an error payload. Kill the transport. Simulate a timeout. Then verify the ADK session stays alive, the failure is visible in traces, and the agent does not hallucinate tool success. If the framework cannot survive a broken tool server, it is not ready for a tool ecosystem.

Typed tool schemas are promises, not decoration

The Pydantic Union fix is the most developer-facing correctness bug in the release. Issue #5799 documented a mismatch introduced by ADK 2.0’s default-on JSON_SCHEMA_FOR_FUNC_DECL. A tool parameter typed as Union[BaseModel, BaseModel] could generate a usable schema, so the agent could call it, but the runtime path still passed raw dictionaries into the Python function. The fix uses pydantic.TypeAdapter to validate against the full Union before execution and adds a substantial test path around the behavior.

This is not pedantry. Tool schemas are one half of the tool-calling contract; runtime conversion is the other half. If a framework tells the model “this tool accepts one of these typed objects,” then hands application code an unvalidated dict, it has shifted the burden back onto every tool author. Worse, it creates a false sense of safety: the schema appears strict, the generated tool call appears valid, and then the actual function receives a different shape than declared.

That pattern is common across agent frameworks. Schema generation gets attention because it improves model calls. Runtime type fidelity gets less attention because it is ordinary software engineering. But production tools often rely on model constructors, defaults, validators, discriminated unions, and branch-by-type behavior. If those semantics are not preserved, agents become an expensive path to rediscover validation bugs.

The action item is simple: add tests where your tools receive the exact runtime types you declared, especially for unions, nested models, optional fields, and defaults. Do not only inspect the JSON schema. Call the tool through the framework. If the framework cannot round-trip types correctly, put validation at the tool boundary yourself.

Live agents are event streams now

The Gemini 3.1 Live fixes reinforce a broader point: modern agent runs are no longer a neat request/response exchange. ADK 2.1.0 fixes grounding metadata that could be silently discarded when response packets contained only grounding data. It fixes input/output transcription finished events. It preserves transcription event order in the conversation trajectory. It also adds user.id to gen_ai.user.message telemetry records.

None of that changes the model’s benchmark score. All of it changes whether an engineer can debug a real run. Grounding metadata matters when you need to verify why an answer cited something. Transcription order matters when voice or live sessions become part of the product. User identity in telemetry matters when you need to connect an agent action to an authenticated actor without scraping it from surrounding application logs.

This is where framework maturity is heading. The winners will not be the libraries with the prettiest Agent() constructor. They will be the runtimes that preserve causality across streams: user message, model step, tool call, approval, sandbox state, MCP failure, grounding metadata, and final output. If any part disappears, the trace becomes a story with missing pages.

ADK 2.1.0 is worth upgrading for teams on ADK 2.0 that use MCP, Gemini Live, or typed tool definitions. It is also worth reading as a map of where agent frameworks are going. The syntax layer is settling. The real competition is moving into runtime guarantees: sandbox provenance, error containment, type fidelity, telemetry identity, and auditable event streams.

Google’s agent stack is making an implicit bet: production agents need first-party runtime surfaces, not just orchestration libraries. ADK 2.1.0 does not prove that bet by itself, but it does show the right instincts after the big platform push. Less “look, an agent.” More “can this agent fail in ways we can understand?” That is the part worth shipping.

Sources: Google ADK v2.1.0 release, Google ADK v1.34.1 release, ADK documentation, Google ADK issue #5799, Google Managed Agents announcement

Sandbox provenance is becoming part of the API contract

MCP failures should fail as tools, not as sessions

Typed tool schemas are promises, not decoration

Live agents are event streams now

Sign up for more like this.