ai-frameworks

Pydantic AI 1.95.0 Turns Tool Search Into a First-Class Agent Primitive

Anatoliy Kolodkin

13 May 2026 • 4 min read

Pydantic AI’s latest release is easy to misread as another framework changelog: a few naming migrations, a provider feature, some instrumentation cleanup, a dependency note. That would miss the useful part. Version 1.95.0 is really about admitting a production truth most agent demos still dodge: once an agent has a serious tool surface, “just put every tool in the context window” stops being architecture and starts being expensive negligence.

The headline is native Tool Search for Anthropic and OpenAI, plus custom search strategies for every other provider. In practical terms, Pydantic AI agents can now mark tools with defer_loading=True, keeping them out of the model’s initial context until they are discovered by keyword. PR #5143 closes two long-running issues and gives large toolsets a shape that looks more like a runtime index than a prompt appendix.

That distinction matters. Early agent examples usually have three to five tools: search, read file, write file, maybe call an API. Real systems do not. Real systems accumulate internal service endpoints, repo tools, deployment helpers, observability queries, MCP servers, CRM actions, eval runners, compliance checks, and business-specific functions with names that only make sense inside one company. If every one of those function schemas is injected into every model call, the agent pays in tokens, latency, model confusion, and attack surface before it has even decided what it needs.

Tool discovery is not prompt decoration

The important design move in 1.95.0 is that Pydantic AI supports two paths. Where Anthropic or OpenAI can handle provider-native tool search, the framework can let the provider manage visibility on the wire. Where that support does not exist, custom provider-agnostic search strategies can fill the gap. That is exactly the split a serious framework needs: use native affordances when they are available, but do not make your architecture hostage to one provider’s current API.

This is where Pydantic AI’s broader posture shows through. The same release starts the June v2 migration path by renaming “built-in tools” to “native tools” and consolidating capability registration through capabilities= using primitives like NativeTool(...), WebSearch(), WebFetch(), MCP(), and ImageGeneration(). The old builtin_tools= and Builtin* APIs still work in 1.x, but they now emit PydanticAIDeprecationWarning. That is the right kind of annoying: early enough to let teams migrate before v2, explicit enough that hidden compatibility magic does not become a production surprise.

PR #5331 adds the new provider-adaptive capability style with options such as local='duckduckgo', local=True, and builtin=True. The framework now warns on implicit fallback paths that will change in v2. That looks like API housekeeping, but it is really portability discipline. “Supports many providers” is a weak claim if capability behavior silently changes when the selected model lacks one native affordance. Teams need to know when a web search is provider-native, locally substituted, or unavailable. Otherwise the runtime becomes a vibes-based compatibility layer.

There is a security angle here too. A giant always-visible tool list does not just waste context; it broadens the set of actions the model can be induced to consider. Deferring visibility until search is not a complete policy system, but it is a cleaner primitive for one. If a sensitive deployment tool is not visible until a specific discovery path is taken, you have a better chance of instrumenting, logging, gating, and reviewing that path. If every tool is always present, your policy layer starts from a worse default.

The observability question moves one layer earlier

Pydantic AI 1.95.0 also adds an Instrumentation capability while deprecating Agent(instrument=...). The docs already position Logfire instrumentation as tracing each run, including spans for model calls and tool execution, using OpenTelemetry and the GenAI semantic conventions. With Tool Search, observability needs to capture more than calls. It needs to capture visibility.

For production teams, the debugging question is no longer just “which tool did the model call?” It is “which tools were available, which tools were deferred, what search query exposed them, and did the model select from the right candidate set?” That is a materially different audit trail. If an agent fails because it never finds the refund API, that is not the same class of bug as finding it and calling it with bad arguments. One is a discovery problem. The other is an execution problem. Frameworks that blur those together will make operators reconstruct incidents from incomplete traces.

The release includes several other operational details: support for Gemini 3 structured output plus tools, restored Bedrock client swapping, normalized Bedrock model IDs for capability lookup, and reinstating mistral as a default dependency while excluding the compromised mistralai==2.4.6. None of those is the lead, but they reinforce the pattern. Pydantic AI is doing the less glamorous work of making capability declarations, provider differences, dependency hygiene, and instrumentation line up.

The repo’s scale makes that work worth watching: at research time, Pydantic AI had roughly 17,042 stars, 2,070 forks, and 533 open issues. This is no longer a cute typed-wrapper experiment. It is one of the frameworks developers are actually evaluating when they want agent code that feels closer to normal Python engineering than orchestration theater.

The action item for builders is straightforward. If you use Pydantic AI today, audit your tool registration before v2 lands. Find every place you rely on implicit provider fallback, every place built-in/native tools are registered through old names, and every agent with a tool list large enough that prompt stuffing has become the default. Then decide what your visibility policy should be. Which tools should be eagerly loaded? Which should be searchable? Which should require approval? Which should never be exposed through keyword discovery at all?

The bigger read: Tool Search is not a convenience feature. It is the point where tool surfaces stop being static prompt furniture and become runtime infrastructure. That is where agent frameworks need to go if they want to survive contact with real systems. Pydantic AI 1.95.0 is not perfect proof that the category has grown up. It is a useful sign that at least one serious framework is optimizing for the world where agents have more tools than a context window should politely carry.

Sources: Pydantic AI v1.95.0 release, PR #5143, PR #5331, PR #5338, Pydantic AI docs, Logfire instrumentation docs, OpenTelemetry GenAI semantic conventions

Tool discovery is not prompt decoration

The observability question moves one layer earlier

Sign up for more like this.