azure-ai

Foundry Private Networking Is Not a Checkbox. It Is Part of the ML System.

Anatoliy Kolodkin

13 May 2026 • 5 min read

Private networking in Microsoft Foundry sounds like the sort of infrastructure checkbox that gets delegated to the network team after the AI demo works. That is backwards. Once a Foundry workload uses private endpoints, managed virtual networks, private inference, RAG dependencies, Key Vault, Azure AI Search, storage, registries, and controlled outbound access, the network is no longer a perimeter wrapper. It is part of the ML system.

Microsoft’s Foundry architecture post is useful because it says the quiet part out loud: isolation changes application behavior. It affects how developers build, how inference runs, how retrieval works, how evaluation jobs reach data, how package dependencies resolve, and how operators debug failure. “No public exposure” is not one requirement. It is a bundle of runtime dependencies with DNS in the middle, because of course it is DNS.

The post separates two operating models: bring-your-own VNet and managed VNet. BYO VNet is the enterprise-control option: hub-and-spoke or vWAN architectures, deterministic DNS, fine-grained routing, firewall inspection, private-service integration, and end-to-end network ownership. Managed VNet is the platform-managed option: faster onboarding, managed security defaults, simplified isolation, and less low-level networking work for teams that do not need full control.

Those are not two skins over the same checkbox. They imply different ownership models, cost models, incident paths, and governance obligations.

The architecture decision comes earlier than teams want

The uncomfortable part is that this choice needs to happen before the AI architecture hardens. Foundry private inference depends on private access to Azure AI Search, Storage, Key Vault, model endpoints, container registries, and sometimes package or model dependencies. If a team designs a RAG system assuming easy public egress and then later gets told to “make it private,” the fix may involve rebuilding assumptions across retrieval, evaluation, deployment, and operations.

Microsoft’s listed failure modes are familiar to anyone who has shipped private cloud services: private endpoints stuck pending approval, DNS resolving to public endpoints instead of private IPs, missing managed outbound rules for Storage or Container Registry, compute that can reach the workspace but not the data, and portal/API behavior differences under full isolation. None of these sound like AI problems. All of them can break an AI application.

That is the point. Production AI is a distributed system with model calls as one dependency among many. A failed retrieval step may look like a model-quality issue to the user. A missing outbound rule may look like an evaluation pipeline bug. A private endpoint approval delay may become a release blocker. If traces only show the model request and response, teams will debug the wrong layer.

The updated Microsoft Learn documentation makes the managed VNet story more concrete. Managed VNet now supports Prompt and Hosted Agent services with the new Responses API and the new Foundry Portal. Microsoft lists support across 18 regions: East US, East US2, Japan East, France Central, UAE North, Brazil South, Spain Central, Germany West Central, Italy North, South Central US, Australia East, Sweden Central, Canada East, South Africa North, West US, West US 3, South India, and UK South.

Region support matters because architecture diagrams have a bad habit of being globally optimistic. If your data residency requirement points to a region outside the supported set, the managed VNet answer may not be available yet. If your model deployment, search service, storage, and agent runtime do not line up regionally, the security architecture may be theoretically clean and operationally useless.

Managed isolation has one-way doors

The sharpest details are in the managed VNet modes. Microsoft documents three outbound options: allow internet outbound, allow only approved outbound, and disabled. Once enabled in a mode, some changes are intentionally constrained. You cannot disable managed isolation after enabling it, and moving from approved-only outbound to internet-outbound is not supported.

That is exactly the kind of platform decision product teams miss when networking is treated as a late-stage hardening task. “Approved outbound only” may be the right security posture. It may also create managed Azure Firewall costs and operational work. FQDN rules are limited to ports 80 and 443. You cannot bring your own Azure Firewall into the managed VNet. You cannot reuse the same managed firewall across multiple Foundry accounts. There is no Azure Portal UI support yet to create managed networks; Bicep, Terraform, and Azure CLI/REST paths are the current route.

None of that makes the feature bad. It makes it real. Isolation always moves complexity somewhere. Managed VNet removes some of the low-level ownership from application teams, but it does not remove the need to design outbound dependencies, cost allocation, environment strategy, and failure response.

BYO VNet has the opposite tradeoff. It gives enterprise network teams the deterministic control they usually want: custom DNS, route tables, inspection, hub integration, and private connectivity to existing systems. It also means those teams own the integration surface. AI teams cannot simply say “the platform blocked us” when a retrieval dependency fails. The platform and application boundaries need to be explicit, documented, and tested.

Turn network assumptions into tests, not tribal knowledge

The practitioner move is straightforward: write pre-production checks for the private network path the same way you write tests for application behavior.

Can the agent reach Azure AI Search through private DNS from its actual runtime environment, not just from a developer machine? Can the evaluation pipeline read datasets from storage and write results while public network access is disabled? Can hosted agents resolve Key Vault, storage, model endpoints, internal APIs, and registries under approved-only outbound rules? What fails when a private endpoint is pending approval? Are those failures visible in telemetry with a useful error, or do they collapse into “agent unavailable”?

RAG systems deserve special attention. Retrieval is often the first place private networking mistakes show up because the model endpoint may be reachable while the search index, blob store, embedding dependency, or metadata service is not. If retrieval silently degrades, the model may still produce fluent answers grounded in stale or incomplete context. That is worse than a hard failure because it looks like success until a user notices the answer is wrong.

Security and reliability are converging here. A misconfigured private endpoint is a security-control issue, but the symptom is often latency, timeout, missing context, or failed inference. A blocked outbound dependency is a governance choice, but the user experiences it as a broken assistant. Observability needs to cross model calls, tool calls, retrieval dependencies, DNS behavior, identity, and private-link state. Otherwise the app team blames the model while the network team blames the app, which is the cloud-native version of two pagers pointing at each other.

Microsoft’s post is valuable because it drags private networking into the AI architecture conversation where it belongs. The headline is not “Foundry supports isolation.” The headline is that isolation changes the system. It changes how developers work, how agents reach knowledge, how evaluation runs, how costs appear, how outages manifest, and how much platform engineering is required before the first production user shows up.

The LGTM take: do not add private networking to Foundry after the demo. Design it with the demo. In production AI, the private endpoint is not a compliance sticker on the diagram; it is a dependency your model will trip over at 2 a.m. if you pretend otherwise.

Sources: Microsoft Azure AI Foundry Blog, Microsoft Learn managed virtual network documentation, Microsoft Learn Private Link documentation, Azure Private Endpoint troubleshooting documentation

The architecture decision comes earlier than teams want

Managed isolation has one-way doors

Turn network assumptions into tests, not tribal knowledge

Sign up for more like this.