azure-ai

Microsoft’s Stateless MCP-on-App-Service Pattern Is What Happens When Agent Tools Grow Up

Anatoliy Kolodkin

18 May 2026 • 4 min read

Microsoft publishing a guide to scaling stateless MCP servers on Azure App Service sounds like infrastructure plumbing because it is. That is the point. The Model Context Protocol is moving from “local tool server I started in a terminal” to “production API an agent depends on,” and production APIs need load balancing, identity, observability, deployment strategy, and fewer sticky-session hacks.

The TechCommunity post and sample show a stateless FastAPI MCP server running on three Azure App Service instances behind the platform load balancer, with a staging slot, Application Insights, and a k6 load-test script. The key idea is simple web architecture: each JSON-RPC request carries what the server needs, any instance can handle it, and the server does not rely on long-lived client state in memory. Microsoft calls out the App Service setting clientAffinityEnabled: false, because leaving the default ARRAffinity cookie on pins a client to one instance and defeats the point of horizontal scale.

This is not glamorous. It is also one of the more useful agent-infrastructure patterns Microsoft has published lately.

MCP’s first wave was artisanal. Production will not be.

MCP became popular because it gave agents a standard-ish way to reach tools: docs, databases, issue trackers, internal services, log systems, and whatever else developers wired up over a weekend. That local phase was necessary. It proved the interface. But a tool server running next to one developer’s editor is not the same thing as a service used by a fleet of agents across an organization.

Once agents depend on tools to do work, the tool server becomes part of the runtime. If it is down, the agent is less useful. If it leaks data, the agent becomes a distribution mechanism for that leak. If it scales poorly, model latency gets blamed for infrastructure failure. If it cannot be deployed safely, every update becomes an outage risk. If it lacks logs, “the agent made a bad decision” becomes impossible to debug because the evidence disappeared inside a process nobody instrumented.

That is why stateless HTTP matters. Persistent sessions are comfortable in local development, but they are hostile to cloud operations. If a server keeps client state in memory, scaling requires sticky sessions, deploys risk breaking active work, failures become harder to recover from, and load balancing becomes mostly decorative. Stateless JSON-RPC over HTTP is boring because it inherits decades of operational practice. Put instances behind a load balancer. Health-check them. Roll them through slots. Capture traces. Scale out. Roll back when needed.

Microsoft’s sample is intentionally modest: example tools such as whoami, lookup_fact, and compute_primes, a P0v3 App Service plan with capacity 3, alwaysOn: true, a /health endpoint, Python 3.11, staging slot, and Application Insights. The k6 example expects roughly even distribution across three instances — about 614, 612, and 616 hits in a sample 1,842-request run — and the post includes a Kusto query over Application Insights requests summarized by cloud_RoleInstance to prove traffic spread.

Those details are the article. Agent tooling needs fewer manifestos and more settings that decide whether the thing actually works.

The load balancer is the easy part. Identity is next.

The sample keeps authentication simple for deployment, but a real MCP server should not be an anonymous internet endpoint with a cute protocol name. MCP exposes capabilities. Capabilities require identity, authorization, and logs.

For Azure teams, that means putting Entra ID or another real auth layer in front of the service, identifying the calling user or agent, scoping which tools each principal can invoke, and logging every tool call with correlation IDs. If an agent calls lookup_customer, restart_service, or query_logs, the organization needs to know who delegated the work, which agent acted, what arguments were supplied, what the tool returned, and whether the result influenced a code or operations change. Without that, MCP becomes a beautifully standardized way to lose accountability.

Secrets also need discipline. Do not let every team bake credentials into local config files or environment variables with unclear ownership. Use managed identity where possible, Key Vault for secrets, and per-tool authorization rather than one broad service credential. Outbound network access should be treated as part of the threat model. An MCP server that can call internal APIs, external SaaS systems, and package registries has a larger blast radius than a plain documentation lookup service.

Observability is the other production requirement. “The agent failed” is not an incident report. Teams need request spans, tool names, latency, errors, role instances, status codes, downstream dependencies, and ideally a link between the agent session and the tool-call trace. Application Insights and OpenTelemetry are not optional niceties here. They are how you separate model confusion from tool failure, auth failure, bad arguments, slow dependencies, and scale problems.

Standardize the tool-server pattern before every team invents one

The practical takeaway for Azure AI builders is to stop treating MCP servers as per-agent accessories. If multiple teams are building agents, platform teams should provide a blessed hosting pattern: stateless-by-default, authenticated, observable, deployed through slots, backed by health checks, with clear guidance for durable state when a tool genuinely needs it.

Not every tool can be pure. Some require durable external storage, queues, locks, or long-running jobs. That is fine. The rule should be explicit: keep the MCP process stateless when possible, push durable state into managed backing services when necessary, and never depend on in-memory session tables for correctness. If a tool needs workflow state, use a database or queue designed for that job. Do not smuggle production state into a Python process and then wonder why scale-out behaves strangely.

This also matters for cost and reliability. Agents are bursty. A developer may trigger several sessions, a CI workflow may invoke tools repeatedly, or a support agent may fan out across internal systems during an incident. App Service scale-out, Always On, health checks, and deployment slots give teams a familiar operating model. The alternative is a collection of bespoke tool servers that fail differently, log differently, and require different security reviews. That is how agent platforms become unmaintainable before they become useful.

Microsoft’s post is valuable because it treats MCP as infrastructure, not magic. The protocol gives agents a way to ask for capabilities. Azure App Service gives those capabilities somewhere sane to run. The missing work is organizational: define ownership, auth, logging, deployment, review, and lifecycle rules before the tool surface sprawls.

The editorial read: MCP is growing up. If agents depend on tools, MCP servers are production APIs. Production APIs need load balancing, identity, observability, staged deployments, and a strong allergy to sticky-session cleverness. Anything less is just a demo waiting to become someone’s incident.

Sources: Microsoft TechCommunity, sample repository, Model Context Protocol specification, Azure App Service docs

MCP’s first wave was artisanal. Production will not be.

The load balancer is the easy part. Identity is next.

Standardize the tool-server pattern before every team invents one

Sign up for more like this.