claude-code

Anthropic’s TypeScript SDK Gets Middleware — The Control Plane Claude API Apps Were Missing

Anatoliy Kolodkin

05 Jun 2026 • 4 min read

Middleware is one of those features nobody asks for in a keynote and every serious team eventually builds badly for itself. Anthropic’s TypeScript SDK finally has it. With @anthropic-ai/sdk 0.101.0, released June 5, Anthropic added first-class middleware to the Claude client — and pushed the same surface into its Bedrock, Vertex, Foundry, and AWS adapter packages minutes later.

That sounds like plumbing. It is plumbing. But plumbing is what separates a weekend Claude wrapper from a production agent system that can be observed, budgeted, audited, and debugged when the model call at the center of everything starts behaving like infrastructure instead of a chat box.

The release itself is compact: “client: add support for middleware,” plus streaming fixes around stop_details accumulation and partial JSON parsing for scientific-notation numbers. The details matter more than the label. Middleware now wraps every HTTP request made by the client. It can inspect or modify requests, observe or replace responses, short-circuit calls, and even call next() multiple times for custom retry behavior. It runs on each HTTP attempt, including the SDK’s automatic retries, with the attempt number exposed through X-Stainless-Retry-Count.

The extension seam teams were already faking

Production LLM applications do not simply “call Claude.” They attach tenant IDs. They stamp trace IDs. They log usage. They hash sensitive fields. They enforce per-customer budgets. They add internal request IDs so an API complaint can be correlated with application logs. They reject requests that do not include the right metadata. They sample payloads for debugging. They sometimes route calls differently depending on region, product tier, or compliance boundary.

Without an official interception layer, teams end up doing this by wrapping every client method, monkey-patching fetch, routing everything through an internal proxy, or forking their own SDK abstraction. Each of those works until the first edge case: streaming responses, automatic retries, abort signals, cloud-provider adapter differences, or a response body that gets consumed once by a logger and then disappears before application code can parse it.

Anthropic’s middleware example is refreshingly practical. It shows request/response timing logs, adding a per-request x-my-app-request UUID header, and logging token usage by calling ctx.parse<Anthropic.Message>(response) without consuming the response body. The docs also warn developers not to consume the response body they return. That warning is not pedantry; it is the exact class of bug that turns instrumentation into production breakage.

The most important fix may be the least glamorous one: timeout semantics now apply the request timeout to the underlying fetch, not to the surrounding middleware chain. That distinction is subtle until you have middleware doing real work. A budget check, audit log write, signing step, or custom retry loop should not silently eat the transport timeout budget and make a network failure look like an application failure. Middleware still needs its own guardrails — any hook can hang if you write it badly — but Anthropic has separated two clocks that should never have been collapsed.

Cloud portability is the strategic tell

The same middleware surface landed across @anthropic-ai/bedrock-sdk 0.30.0, @anthropic-ai/vertex-sdk 0.17.0, @anthropic-ai/foundry-sdk 0.3.0, and @anthropic-ai/aws-sdk 0.4.0. The GitHub API timestamps are almost comically synchronized: the direct SDK release at 19:48:24 UTC, Vertex at 19:48:32, Bedrock at 19:48:38, Foundry at 19:48:49, and AWS at 19:49:00. The npm package followed at 19:51.

That matters because enterprise Claude usage is increasingly split across procurement paths. Some teams call Anthropic directly. Some are routed through AWS Bedrock. Some are routed through Google Vertex. Some are experimenting with Microsoft Foundry-style enterprise paths. If every path has different hooks, headers, logging semantics, and retry observability, your internal governance layer becomes a maze of provider-specific adapters. By pushing middleware across the family, Anthropic is signaling that instrumentation and policy should be portable even when the endpoint is not.

For builders, the immediate move is to stop treating middleware as a place to add cute logging and start treating it like part of the runtime contract. Use it to attach stable request IDs. Capture model, token usage, latency, retry count, and tenant metadata. Enforce that every production call carries an application-level owner. Add redaction before anything leaves the process. Reject calls that target the wrong base URL. And test streaming paths specifically; a middleware implementation that works for normal JSON responses but breaks event streams is not production-ready.

There is also a security footgun here. Middleware can make Claude calls safer, but it can also smuggle secrets into headers, mutate prompts invisibly, retry non-idempotent operations, or hide response transformations from the rest of the codebase. Treat it the way you would treat hooks in a coding agent: powerful, useful, and audit-worthy. Put middleware definitions in reviewed code, not scattered helper files. Log which middleware chain was active for a request. If your organization has an agent approval policy, SDK middleware belongs in scope.

The streaming fixes in this release reinforce the same theme. Anthropic fixed accumulation of beta message_delta stop_details, and fixed partial JSON parsing for numbers such as 8.2156e-15, 1.5e+10, 2E8, and -1.5e-3. That is not headline material, but structured output systems live and die on boring parser correctness. If a model emits scientific notation in a JSON stream and your parser mangles it, the agent did not “hallucinate.” Your runtime broke reality on the way out.

Public reaction was basically nonexistent in the same-hour window. That is fine. Middleware releases do not trend. They become important six months later when every serious Claude application has the same three questions: what did we send, who paid for it, and why did this request retry twice before the user saw a timeout?

The verdict: this is a small SDK release with a large operational implication. Anthropic is giving TypeScript Claude apps a real interception layer. Use it for observability, cost controls, and policy enforcement — then review it like production infrastructure, because that is exactly what it is.

Sources: Anthropic TypeScript SDK release, middleware example, middleware implementation, npm package metadata, Anthropic client SDK docs

The extension seam teams were already faking

Cloud portability is the strategic tell

Sign up for more like this.