Pydantic AI 1.95.1 Fixes the Observability Regression That Durable Agents Cannot Afford

Pydantic AI 1.95.1 Fixes the Observability Regression That Durable Agents Cannot Afford

Pydantic AI 1.95.1 is a two-line release note with a production-sized warning label: if your agent framework treats observability as an optional plugin, your durable workflows will eventually make that lie expensive.

The patch shipped on May 13 with two bug fixes. First, Pydantic AI now eagerly imports the dependencies used by current_otel_traceparent(), which fixes agent runs inside Temporal workflows. Second, it un-deprecates Agent.instrument and InstrumentedModel, because Pydantic’s own Logfire integration still relies on those public surfaces. That sounds like internal cleanup. It is not. This is the seam where tracing, deterministic workflow execution, public API migration, and capability-driven agent runtimes collide.

The Temporal failure is the sharper lesson. Pull request #5422 describes workflow tasks failing with WorkflowWorkerUnhandledFailure because opentelemetry.trace was imported after the initial workflow load. Temporal’s sandbox rejected the late import, retried the workflow task, and kept retrying until CI eventually hit a six-hour cap. The repro path involved capabilities=[CodeMode()] from pydantic-ai-harness inside a TemporalAgent workflow — exactly the sort of composed, non-happy-path setup that production systems grow into after the demo works.

Durable execution is not a friendly runtime

Temporal is useful precisely because it is strict. Workflow code has to be deterministic, replayable, and careful about side effects. Model calls, tool calls, MCP communication, and other I/O belong in activities; the workflow loop is supposed to orchestrate, not improvise. That model is great for long-running agents that need retries, resumability, and auditability. It is also merciless toward framework code that assumes normal Python import behavior is harmless.

A traceparent helper that lazily imports OpenTelemetry may be fine in a FastAPI route or a notebook. Inside Temporal’s deterministic sandbox, it can become a poison pill. The bug is small at the implementation level — move imports earlier — but the architecture lesson is larger: observability cannot be bolted onto durable agents after the runtime contract is set. The trace boundary is part of the execution boundary. If the tracing code changes when and how modules load, it is no longer “just instrumentation.” It is runtime behavior.

That matters because durable agent frameworks are becoming the serious end of the agent market. Toy agents can fail and restart. Production agents need to resume work after a worker crash, preserve state across long-running jobs, explain tool choices after the fact, and survive replay without doing weird things twice. Once you adopt that model, anything that participates in the agent loop — tracing, capability wrappers, context propagation, tool adapters — has to be tested inside the durable runtime, not merely around it.

Deprecation warnings are a contract smell

The second fix looks softer but is almost as important. Pydantic AI 1.95.0 pushed instrumentation toward the newer capabilities=[Instrumentation(...)] path. Architecturally, that direction makes sense. Capabilities are becoming the organizing primitive for agent frameworks: tool search, native tools, code execution modes, instrumentation, and runtime extensions all want a composable place in the request pipeline.

But issue #5400 exposed the migration problem. The official Logfire integration still supported documented usage patterns like logfire.instrument_pydantic_ai(my_agent), which writes to Agent.instrument, and model instrumentation paths that construct InstrumentedModel. After earlier deprecations, those official paths emitted visible PydanticAIDeprecationWarnings. That is not merely cosmetic. If the documented observability path warns every user, teams learn to ignore warnings from the exact layer that is supposed to tell them when production behavior is drifting.

Pydantic’s fix is the grown-up choice: keep the migration direction, but restore the public contract beneath the official integration. PR #5427 un-deprecates the setter/getter and InstrumentedModel, keeps constructor-level instrument= deprecations in place, consolidates the legacy entry points into _resolve_instrumentation_settings(), and removes 82 lines in the process. That is the right sort of cleanup. Migration should make the old path boring until the new path is ready, not noisy because the framework got ahead of its own ecosystem.

The practical point for engineering teams is simple: treat observability APIs as production APIs. They are not decoration. They are how you reconstruct agent behavior, inspect cost, evaluate regressions, correlate tool calls, and defend a decision after something goes wrong. Breaking or warning on official instrumentation paths has an operational cost, even when the agent’s final answer still looks correct.

Test the composition, not the feature

The more interesting pattern across both fixes is composition risk. Pydantic AI’s own Temporal tests did not hit the capability-driven agent-run path that exposed the lazy import failure. That is not surprising, and it is not unique to Pydantic. Modern agent frameworks increasingly work by stacking features: a model wrapped with instrumentation, a toolset exposed through MCP, a code-mode capability, a durable execution backend, a tracing exporter, and a cache or approval layer sitting somewhere in the middle. Each component passes its own tests. Then the composed system fails because one component makes an assumption the next runtime forbids.

Builders should take three concrete actions from this release. Upgrade to 1.95.1 if you are on 1.95.0 and using Temporal, Logfire, or capability-heavy Pydantic AI agents. Add integration tests that run tracing inside the actual worker environment — Temporal workflows, background queues, sandboxed containers, serverless cold starts, whatever your production shape is. And audit your warning budget: deprecation warnings in observability paths should be investigated, not normalized into log wallpaper.

The broader bet is that Pydantic AI is moving in the right direction. A typed agent framework with durable execution, Logfire/OpenTelemetry instrumentation, MCP support, and composable capabilities is aiming at the real production problem, not the prompt-demo problem. But 1.95.1 is useful because it shows where that ambition actually breaks: not when the model says something silly, but when a trace helper imports too late or an official integration warns on its own public API.

That is the boring work agent frameworks now have to do. The next phase is not about showing that agents can call tools. We know they can. It is about making the runtime predictable enough that teams can replay, inspect, upgrade, and trust those tool calls six months later. Pydantic AI 1.95.1 is a small patch with the right lesson attached: observability is part of the runtime contract. Treat it that way, or Temporal will do the code review for you.

Sources: Pydantic AI v1.95.1 release notes, PR #5422, PR #5427, issue #5400, Pydantic AI Temporal docs, Pydantic AI overview