ai-frameworks

LiteLLM 1.85 Ships the AI Gateway Release That Treats Routing, MCP Auth, and Supply Chain Verification as the Same Production Problem

Anatoliy Kolodkin

17 May 2026 • 5 min read

LiteLLM’s latest release is not interesting because it added another knob to a model router. It is interesting because the project is starting to look like the place where the uncomfortable parts of production AI systems all collide: routing policy, MCP authentication, observability semantics, budget controls, and supply-chain verification. That is the correct mess. If an organization has agents talking to models and tools through one gateway, that gateway is no longer a convenience wrapper. It is part of the security perimeter.

Version v1.85.0, published on May 17, lands with a cluster of fixes and operational changes that point in the same direction. LiteLLM hardened Docker image verification with Cosign instructions, fixed MCP OAuth handling on tool routes, blocked path traversal SSRF paths in several provider clients, advanced OpenTelemetry GenAI semantic-convention support, and began splitting the gateway, UI backend, and UI into separately scalable services. A few minutes later, v1.86.0-rc.1 followed with weighted-routing failover and more control-plane work.

That may sound like release-note soup. It is not. The story is that the “LLM gateway” category is becoming the agent runtime’s choke point. LiteLLM’s docs already position it as a unified interface for 100+ LLMs, with OpenAI-compatible APIs, retry and fallback behavior through Router, virtual keys, cost tracking, and an admin UI. Once teams route agent traffic through that layer, it becomes the thing that knows who called, which model was selected, which credentials were used, what the request cost, where the trace went, and which backend received the retry. That is infrastructure, not glue code.

MCP auth bugs are gateway bugs, not agent weirdness

The most important fix in v1.85.0 is the one that preserves OAuth2 machine-to-machine authentication for StreamableHTTP MCP tool routes. The underlying issue was plain and dangerous: callers authenticated to LiteLLM with one bearer token, while upstream MCP servers required a different OAuth2 client-credentials token. In affected paths, the caller’s LiteLLM API key could overwrite the upstream token, causing 401 Unauthorized failures and breaking the intended auth boundary.

This is exactly the kind of bug agent teams should expect as MCP adoption accelerates. There are two identities in play: the user or client calling the gateway, and the service identity the gateway uses when it talks to the tool backend. Confuse those and the system either fails closed in confusing ways or, worse, starts forwarding credentials into places they do not belong. Prompt injection gets the headlines, but credential mixing is the more boring failure mode that actually shows up in production incident reviews.

The lesson for practitioners is straightforward: document which layer owns which credential. A LiteLLM API key, an upstream provider key, an MCP delegated auth token, and a user OAuth grant are not interchangeable just because they all appear in an Authorization header. If your gateway supports MCP servers, test tool listing and tool calling separately, especially when StreamableHTTP routes and delegated auth are involved. The happy path is not enough; verify that the same policy meaning survives across discovery, invocation, and retries.

The other security fix blocks path traversal SSRF patterns in BitBucket, Arize Phoenix, and AssemblyAI clients. The PR describes authenticated SSRF paths where user-controlled identifiers such as prompt_id or prompt_version_id could inject ../, #, or ? into service-credentialed request URLs. This is not exotic. It is ordinary URL construction risk in a new costume. If an agent can influence identifiers that become provider API paths, those identifiers need the same treatment as untrusted web input.

Routing is reliability policy now

The v1.86.0-rc.1 weighted-routing failover work is a reliability feature with governance implications. After a failure, LiteLLM can retry the same model group on a different deployment — for example another Azure region — while still respecting configured weights for the initial pick. That is the difference between “we abstracted model providers” and “we can operate this under load.”

For agent systems, this matters more than it does for one-off chat calls. A failed model request in the middle of a multi-step workflow can waste tool state, confuse the planner, or leave a human approval flow half-complete. Failover reduces that pain, but it can also hide important differences. Two deployments of the “same” model may have different latency, quota behavior, content-filtering responses, or provider-specific quirks. If the retry path is invisible in traces, operators will debug ghosts.

That is why the OpenTelemetry GenAI semantic-convention opt-in is the quieter long-term signal. LiteLLM added support for OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental and corresponding configuration APIs. The point is not that everyone should flip the flag today. The point is that agent observability needs shared language: model spans, provider attributes, request attributes, content events, retry metadata, and tool-call context should not be reinvented by every vendor. The experimental opt-in is the right rollout shape because telemetry migrations are easy to break and hard to unwind.

Practitioners should enable the new semantic conventions in a staging observability pipeline first. Check span names, attribute cardinality, payload handling, and any privacy controls around prompt or completion events. Standardization is good; accidentally dumping sensitive prompts into a trace backend is not.

The Kubernetes scaffolding is the product signal

LiteLLM also began splitting the gateway, UI backend, and UI into independently scalable services, with separate Dockerfiles and Helm chart scaffolding. This is the kind of change that does not demo well and ages well. A monolithic proxy plus admin UI is fine when a team is experimenting. It is less fine when one service handles production traffic, another handles administrative workflows, and a third serves humans browsing budgets, keys, logs, and configuration.

Separate deployments and HPAs let teams scale the hot path without scaling the UI, isolate blast radius, and apply different network and auth policies. That is not overengineering; it is what happens when a developer convenience becomes shared platform infrastructure. The same logic applies to Cosign verification. LiteLLM now documents verification for release Docker images, including a path that pins the public key to an immutable commit and a convenience path that reads cosign.pub from the release tag. If the gateway holds provider credentials and brokers tool access, unsigned image pulls are not a harmless shortcut.

The concrete checklist for teams running LiteLLM is not glamorous, which is usually a sign it is useful. Verify images with Cosign before deployment. Upgrade if you rely on MCP OAuth, especially StreamableHTTP tool routes. Audit any provider integration where user-controlled identifiers become URL path components. Test weighted failover with tracing enabled and confirm the retry target is visible. Evaluate GenAI semantic conventions in staging. If you run the admin UI near production traffic, look at the new component split and decide whether your current topology still makes sense.

The broader take: LiteLLM is crossing from compatibility layer to control plane. That is good, but it changes the bar. A control plane does not get to be casually configured because it speaks an OpenAI-shaped API. It needs signed artifacts, explicit auth boundaries, traceable retries, budget enforcement, and boring deployment hygiene. The agent ecosystem keeps talking about autonomy. This release is a reminder that autonomy is only useful when the plumbing underneath knows exactly who is allowed to do what, where, and with whose credentials.

Sources: LiteLLM v1.85.0 release, LiteLLM v1.86.0-rc.1 release, LiteLLM documentation, MCP OAuth fix, SSRF hardening PR, OpenTelemetry GenAI semconv PR, componentization PR

MCP auth bugs are gateway bugs, not agent weirdness

Routing is reliability policy now

The Kubernetes scaffolding is the product signal

Sign up for more like this.