Pydantic AI 1.94.0 Fixes the OpenAI-Compatible Lie Hiding in Agent Portability

Pydantic AI 1.94.0 Fixes the OpenAI-Compatible Lie Hiding in Agent Portability

“OpenAI-compatible” has become one of the most useful lies in AI infrastructure.

Useful, because it gave local models, hosted inference providers, enterprise gateways, and framework authors a common shape to build around. A lie, because anyone who has tried to move a real agent from OpenAI’s API to a LiteLLM route, a vLLM deployment, or a Qwen-flavored backend knows the contract is not “same behavior.” It is “close enough until your framework emits the first message shape the backend did not expect.”

That is why Pydantic AI v1.94.0, released May 12 at 06:52 UTC, is more interesting than its two-line changelog suggests. The headline change is a new openai_chat_supports_multiple_system_messages profile flag for OpenAI chat models. When set to False, Pydantic AI merges consecutive initial system and developer messages into one message separated by double newlines before sending them to stricter OpenAI-compatible backends.

That sounds like plumbing. It is plumbing. But plumbing is where production agent systems usually flood the basement.

The compatibility bug hiding inside layered prompts

The failure mode in PR #5375 is wonderfully specific: some OpenAI-compatible backends reject requests with multiple initial system messages and errors such as “System message must be at the beginning.” Pydantic AI can naturally produce that shape because agent instructions are no longer a single string at the top of a prompt. They are composed from layers.

A realistic agent may have a static system_prompt, dynamic instructions, one or more @agent.system_prompt functions, tool-specific policy, user preference context, and maybe a repo profile or organization safety note. In an agent framework, that composition is a feature. It lets platform teams separate global policy from task-level guidance and runtime context. In an OpenAI-compatible backend, that same composition can become a protocol edge.

Pydantic AI’s fix is conservative in the right way. OpenAI behavior remains the default. Developers opt into the compatibility path through the model profile. The mapper only merges consecutive initial system/developer role messages inside OpenAIChatModel._map_messages; it intentionally skips user-role system prompts because the mapper cannot reliably distinguish those from real user turns. That restraint matters. The worst framework fixes are the ones that normalize too aggressively and quietly change semantics while pretending to improve portability.

This is the real agent-portability story: endpoint shape is not capability. A provider can expose an OpenAI-style chat-completions API and still differ on multiple system messages, developer-role support, tool-call schema details, structured-output modes, streaming event shape, usage metadata, retry behavior, and instruction precedence. Those are not cosmetic differences. They are runtime behavior.

Model-agnostic frameworks need capability profiles, not vibes

Pydantic AI’s pitch is explicitly model-agnostic. Its docs list support across OpenAI, Anthropic, Gemini, DeepSeek, Grok, Cohere, Mistral, Perplexity, Azure AI Foundry, Bedrock, Vertex AI, Ollama, LiteLLM, Groq, OpenRouter, Together AI, Fireworks, Cerebras, Hugging Face, GitHub, Heroku, Vercel, Nebius, OVHcloud, Alibaba Cloud, SambaNova, and custom models. That is a large surface area. It is also an implicit warning: no abstraction that broad can rely on marketing-level compatibility claims.

The useful pattern here is not “merge system messages.” It is “make provider capabilities explicit.” A profile flag gives teams a place to encode known backend behavior. That is less magical than auto-detection, but it is more reviewable. A platform engineer can look at a model profile and understand the contract: this backend does or does not support multiple initial system messages. That is the kind of configuration that belongs in code review, because it affects how the agent receives authority, policy, and task context.

This also explains why the second change in v1.94.0 — dropping mistralai as a direct dependency from pydantic-ai — is directionally aligned with the same philosophy. It is not the feature anyone will tweet about. But model-agnostic frameworks should be careful about default dependency surface. Every provider SDK pulled into the base package has consequences: container size, cold starts, vulnerability scans, license review, dependency conflicts, and the boring enterprise security questions that arrive right before deployment. Keeping the core lean is not glamorous. It is how framework authors avoid turning “supports everything” into “installs everything.”

That matters more in 2026 because the framework layer is becoming the place where agent systems decide what they are allowed to do. Pydantic AI is not just wrapping model calls. Its documentation emphasizes Logfire and OpenTelemetry observability, evals, MCP, A2A, tool approval, durable execution, streamed structured output, and graph support. Once a framework owns that much of the runtime, compatibility bugs are not isolated API annoyances. They affect tracing, approvals, eval reproducibility, and incident diagnosis.

What builders should do before switching providers

The practical takeaway is simple: stop treating OpenAI-compatible endpoints as interchangeable until you have tested realistic agent traffic against them. A single chat completion is not a compatibility test. Neither is a toy tool call. The test should include your actual prompt layering, your tool schemas, your structured-output mode, streaming, retries, telemetry, and the eval suite you use for release gates.

If you run local or private agents through vLLM, Ollama, LiteLLM, or an enterprise router, maintain a provider capability matrix. At minimum, track whether each backend supports multiple system messages, the developer role, parallel tool calls, strict JSON schema, streaming tool-call deltas, usage accounting, image/audio inputs if relevant, and consistent error codes. Then wire that matrix into model profiles instead of leaving it in someone’s memory or a Slack thread. “It works on my local Qwen route” is not a deployment strategy.

Teams should also run portability evals as behavior tests. If the same agent behaves differently when moved from OpenAI to a local backend, the question is not only whether the model is weaker. It may be receiving different instruction precedence because the message mapper changed the prompt shape. That is the kind of issue that gets misdiagnosed as “the local model is bad” when the actual bug is “our abstraction leaked and nobody noticed.”

Pydantic AI 1.94.0 is a small release, but it catches an important class of production failure. Agent frameworks are now expected to run across cloud models, local models, proxies, routers, and private deployments. The ones that survive will not be the ones that pretend every OpenAI-compatible endpoint is the same. They will be the ones that model the footnotes directly.

Sources: Pydantic AI v1.94.0 release notes, PR #5375, PR #5384, Pydantic AI docs, Pydantic Logfire AI observability docs