ai-frameworks

Google ADK 2.2 Makes the Default Model a Migration Decision, Not a Footnote

Anatoliy Kolodkin

05 Jun 2026 • 5 min read

Google ADK 2.2.0 looks like a normal framework release until you hit the default model change. Then it stops being a changelog and becomes an operations memo.

The Python release moves LlmAgent.DEFAULT_MODEL from gemini-2.5-flash to gemini-3-flash-preview. Google has a rational reason: Gemini 2.5 Flash has a shutdown date of October 16, 2026. But rational is not the same as harmless. In agent systems, defaults are not cosmetic. They are production behavior for every team that did not explicitly pin a model.

That makes ADK 2.2 a useful line in the sand for the agent-framework market. We have spent the last year pretending that model selection is a configuration detail while building systems where the model decides whether to call a tool, ask a clarification question, compact memory, hand work to another agent, or emit something that an evaluator treats as success. A default model swap changes more than prose quality. It can change latency, cost, safety behavior, tool-call frequency, structured-output adherence, and the shape of failures. If that happens because someone ran a package upgrade, the framework has turned dependency management into behavioral migration.

A preview model should not arrive by accident

The new default is gemini-3-flash-preview, while DEFAULT_LIVE_MODEL remains unchanged. That distinction matters. Preview models are where providers move quickly, test behavior, and gather signal. They are useful precisely because they are not frozen. Production agents usually want the opposite: boring behavior, controlled rollouts, and testable deltas.

The engineering move is simple: search your ADK codebase for agents that rely on implicit defaults and add model= anywhere output behavior matters. Do it before upgrading, not after your eval dashboard starts looking weird. Then test Gemini 3 Flash Preview as a deliberate candidate, with named experiments and rollback criteria. The fact that Google is giving the old default a shutdown clock means migration is unavoidable. It does not mean the migration should be silent.

This is especially important for teams using ADK as part of a broader model-routing or agent orchestration layer. If you are comparing Google ADK against LangGraph, Pydantic AI, CrewAI, AutoGen, or Microsoft Agent Framework, the syntax is not the strategic question. The strategic question is whether your runtime makes model choice visible and auditable. An agent framework that hides model churn behind a class constant is convenient until the class constant becomes the incident.

The rest of the release reads like production scar tissue

The model change is the headline, but ADK 2.2 is not just about Gemini migration. The release adds AutoTracingPlugin for OpenTelemetry auto-instrumentation, native gen_ai.client.* metrics, a RubricBasedMultiTurnTrajectoryEvaluator, BigQuery Agent Analytics reliability fixes, A2A metadata preservation, clearer A2A input-required versus auth-required distinctions, custom metadata propagation into run config, and request_input standardization for proactive clarification.

That list is not demo bait. It is the shape of agent systems after they leave notebooks. Basic input/output monitoring is not enough when the failure can happen three tool calls deep, after a context compaction step, during an agent-to-agent handoff, while the final answer still looks plausible. Google’s own ADK observability docs say agent observability needs reasoning traces, tool calls, latent model outputs, logs, metrics, and traces. That is the correct bar. If your observability stops at “prompt in, answer out,” you do not have observability. You have a receipt.

The trajectory evaluator is particularly important. Most teams still test agents like chatbots: provide an input, inspect a final answer, maybe assert on a string or JSON schema. But agents can reach the right answer by taking the wrong path. They can call the wrong tool, leak context into a handoff, ignore a clarification opportunity, or burn tokens in a loop before stumbling into a usable response. Multi-turn trajectory evaluation treats the path as part of the product. That is where serious agent testing has to go.

The OpenTelemetry pieces also shift ADK closer to the infrastructure world that developers already understand. Traces and metrics are not glamorous, but they are what let you compare model upgrades, measure tool-call explosion, detect retry storms, and prove whether a new orchestration pattern is actually better or just more theatrical. If you are running agents as services, wire the traces into your normal stack. Do not keep agent observability in a separate toy dashboard that only the AI team checks when something feels cursed.

Security fixes are the part mature teams should read twice

ADK 2.2 also includes a cluster of reliability and security fixes: aborting API-server runs when clients disconnect, blocking path traversal in Agent Builder file tools, fixing Zip Slip-style path traversal in GCS skill extraction, restricting unpickling of v0 action blobs, enforcing session ownership in delete-session paths, preventing MCP initialization hangs and task-group leaks, and terminating infinite retry loops in LoadSkillResourceTool on RESOURCE_NOT_FOUND.

That is a very specific bug list. It is also a useful reminder that agent frameworks are not just prompt wrappers. They are file systems, web servers, workflow engines, sandbox boundaries, serialization layers, plugin hosts, MCP clients, observability emitters, and permission systems. Every one of those nouns has a security history. Agents did not repeal it.

The path traversal and Zip Slip fixes deserve attention because agent builders tend to normalize “let the agent read and write files” faster than they normalize “the file tool is a security boundary.” If your ADK deployment lets agents manipulate project files, extracted skills, uploaded artifacts, or cloud-backed resources, you should review root containment, path normalization, symlink handling, and ownership checks. These are boring controls until they are the reason an agent touched the wrong directory.

The MCP leak and initialization fixes land in the same category. MCP has become the de facto adapter layer for agent tools, but it also introduces long-lived processes, initialization state, resource loading, and failure modes that look more like distributed systems than chat completions. Teams adopting MCP through ADK should log which servers initialize, how long they take, which tools are advertised, which resources are read, and what happens when a resource disappears. “The MCP server hung” should be a traceable failure, not folklore in Slack.

What engineers should do before bumping the package

Treat this release like a migration even if semver makes it feel routine. First, pin the model explicitly on every production LlmAgent. Second, run a diffed eval suite against the old and new models, including tool-call counts, latency, cost, safety refusals, structured-output validity, and final answer quality. Third, add a test or startup assertion that fails loudly if your runtime is using an implicit framework default where you expect a pinned model.

Fourth, turn on the observability work instead of admiring it from the changelog. Use OpenTelemetry traces and gen_ai.client.* metrics to track agent paths, not just final responses. Fifth, add multi-turn trajectory tests around the workflows where mistakes are expensive: authentication handoffs, file operations, MCP calls, clarification flows, and anything that writes to an external system. Finally, review the security fixes against your own threat model. If ADK had path traversal, session ownership, unpickling, and MCP leak fixes in one release, your application code probably has similarly unglamorous edges.

Google ADK is becoming more coherent as a production framework: multi-language ambitions, Google Cloud Agent Runtime and Cloud Run/GKE deployment paths, A2A and MCP integration, context management, token tracking, evals, BigQuery analytics, and now better tracing. For Google Cloud shops, that is a strong story. For teams that prize provider neutrality, the default-model move is exactly why portability has to include explicit model routing, not just “supports multiple providers” in a README.

The editorial take is straightforward: ADK 2.2 is a good release, but it proves a rule the whole agent ecosystem should adopt. Frameworks should make model swaps loud. Pin behavior, trace behavior, evaluate trajectories, and migrate on purpose. Defaults are fine for tutorials. Production agents deserve receipts.

Sources: Google ADK Python 2.2.0 release, Google ADK documentation, ADK observability docs, Gemini API deprecations, google/adk-python

A preview model should not arrive by accident

The rest of the release reads like production scar tissue

Security fixes are the part mature teams should read twice

What engineers should do before bumping the package

Sign up for more like this.