azure-ai

Foundry IQ's Knowledge Copilot Walkthrough Is the Best Architecture Guide Microsoft's Published This Month

Anatoliy Kolodkin

30 Apr 2026 • 6 min read

Ask any field technician what a good knowledge retrieval system feels like, and they will probably describe something like this: they have a problem, they describe it once, and the system figures out which manuals, policies, vendor documentation, and institutional knowledge to pull from — without being told which source to check first. Traditional RAG cannot do this cleanly because it assumes a single index over a single source. Real enterprise knowledge does not live in one place. It lives in SharePoint, OneLake, blob storage, legacy databases, web resources, and vendor portals simultaneously. A retrieval system that requires you to know the answer to your question before you can ask it has already failed before it starts.

Microsoft published a detailed walkthrough this week for building exactly this kind of multi-source retrieval system using Foundry IQ knowledge bases and the agentic retrieval engine in Azure AI Search. The walkthrough is a full end-to-end implementation guide — C# code samples, architecture diagrams, the works — which makes it the most actionable single document Microsoft has published on enterprise RAG architecture in the Foundry context. It also includes an unusually honest accounting of tradeoffs, which is why it is worth reading even if you are not evaluating Foundry IQ specifically.

The Failure Mode Naive RAG Cannot Survive

The post names the problem with unusual directness. A field technician troubleshooting a piece of equipment might need to pull from a vendor manual in OneLake, a company repair policy on SharePoint, and a public electrical standard on the web. A single-index RAG pipeline would require either three separate retrievals (with the technician orchestrating which to call and how to merge results) or one bloated index that degrades retrieval quality trying to be everything at once. Neither approach is what the technician needs. What they need is a system that understands the question, decomposes it, routes it to the right sources in parallel, and synthesizes a coherent answer — without being explicitly programmed to handle each specific question type in advance.

The agentic retrieval loop described in the post has five stages: Plan (decompose the query into sub-queries and select knowledge sources), Search (execute sub-queries concurrently against the selected sources using keyword, vector, and hybrid search), Rank (semantic reranking of results), Reflect (iterative follow-up queries if the initial information is insufficient), and Synthesize (generate a unified natural-language answer with source citations). That reflect step — the ability to recognize that the answer is incomplete and ask a follow-up — is what separates agentic retrieval from conventional RAG at the architectural level. The system is not just retrieving. It is deciding whether to keep retrieving.

AT&T's reported results give this pattern some empirical weight. The company cited a 33% reduction in customer resolution times, a roughly 10% cut in average handle time, and 71 AI solutions scaled to 100,000 employees. Those numbers are from AT&T's specific implementation — which was almost certainly customized and likely predates the current Foundry IQ preview — but the direction is consistent with what production agentic retrieval systems tend to produce when applied to high-volume, multi-source knowledge problems. The takeaway is not that Foundry IQ delivers 33% reduction by default. It is that the architectural pattern of federated, agentic retrieval applied to a real enterprise knowledge problem produces measurable operational improvements. The exact number will not translate, but the pattern will.

The MCP Endpoint Strategy That Makes This Cross-Framework Usable

The most strategically important detail in the walkthrough is how knowledge bases are exposed to agents. Each Foundry IQ knowledge base exposes an MCP endpoint at a URL like https://<search-service>.search.windows.net/knowledgebases/<kb-name>/mcp?api-version=2025-11-01-preview. A Foundry Agent attaches this via MCPToolDefinition with AllowedTools.Add("knowledge_base_retrieve"). That is not just an implementation detail. It is the mechanism by which enterprise knowledge becomes framework-agnostic.

Because the knowledge base is an MCP server, any MCP-compatible agent can consume it — not just Foundry's own Agent Service. LangChain agents, LangGraph workflows, Semantic Kernel agents, and even external MCP clients can all call the same knowledge base without requiring separate indexing pipelines for each framework. That is the right abstraction for an enterprise knowledge problem, because enterprise knowledge is not a Foundry-only problem. Forcing customers into one agent framework to get federated retrieval would be a self-defeating constraint that undermines the whole value proposition. Microsoft is correctly decoupling the knowledge layer from the agent framework layer.

The practical implication for platform architects is that building knowledge bases as MCP endpoints today means your retrieval investment survives whatever framework changes come next. If a team decides to switch from Semantic Kernel to LangGraph six months from now, the knowledge base does not need to be re-indexed or re-wired. The new agent framework consumes the same MCP endpoint. That is the kind of architectural decision that compounds in value over time.

The Tradeoffs the Preview Label Is Hiding

The walkthrough includes a tradeoffs table that deserves attention precisely because vendors usually leave this kind of thing out of promotional documentation. Foundry IQ is in public preview. Microsoft explicitly says it is not recommended for production workloads without accepting preview SLA terms. That is not a technical limitation buried in the footnotes — it is a structural warning that deserves to be read before you start building. Preview means the API surface can change, billing behavior can change, and the SLA terms you budget against today may not match the GA terms six months from now.

Cost is the other dimension that requires explicit planning. Foundry IQ has two billing streams: Azure AI Search token billing plus Azure OpenAI billing for query planning and synthesis. Every agentic retrieval loop — every Reflect stage, every iterative sub-query — consumes both. The retrieval reasoning effort setting in the post controls how aggressive the agentic loop is: lower effort for fast lightweight lookups, higher effort for iterative multi-hop search across the full data estate. Higher effort produces better answers but more latency and more cost. Teams need to model this tradeoff explicitly before they deploy to production, not discover it in the first billing cycle.

The C# SDK coverage is also worth noting for .NET teams. The walkthrough provides C# code samples for the underlying agentic retrieval queries and general MCP tooling. However, the Foundry IQ-specific agent connection SDK supports Python and REST only — not C# for that specific path. This is a meaningful gap for teams standardized on .NET, and the post is honest about it rather than papering over it with generic SDK language. If you are building on the C# SDK today and need Foundry IQ-specific agent connection, you will need to work around that gap or wait for SDK coverage.

The ACL Enforcement Question Nobody Likes to Ask

Enterprise knowledge systems only matter if they respect access control. The post describes document-level ACLs enforced through Microsoft Purview sensitivity labels, which are respected through both indexing and retrieval. If a user does not have access to a document in SharePoint, the retrieval system is supposed to not return it. That is the right design. The caveat the post does not fully resolve — and which deserves explicit acknowledgment — is per-user authorization at query time.

The walkthrough notes that per-user authorization via per-request MCP headers is not yet supported in the current preview. In other words, the ACL enforcement during indexing is solid. The per-user enforcement at retrieval time — the part that matters when the same knowledge base is used by multiple teams with different access levels — is still maturing. For organizations with strict information barrier requirements, this gap is not theoretical. Teams should evaluate whether the current preview's ACL enforcement is sufficient for their compliance requirements before building production workflows on top of it.

What to Steal and What to Wait For

The most durable insight in this walkthrough is architectural, not technical. The distinction between single-index RAG and federated agentic retrieval — with a Plan/Search/Rank/Reflect/Synthesize loop managing source selection, sub-query execution, and iterative refinement — is the right mental model for enterprise knowledge problems at scale. Teams building knowledge management systems today should design for this pattern even if they are not using Foundry IQ. The components exist independently in Azure AI Search and Azure OpenAI. Building the orchestration loop is the hard part, and Microsoft has done some of that thinking in public.

The Foundry IQ preview is worth evaluating in non-production environments now. The walkthrough gives you a real starting point with actual code and honest tradeoffs. The MCP endpoint approach means your indexing investment survives framework changes. But do not deploy it to production until the per-user authorization gap is resolved and the SLA terms are GA. The pattern is right. The timing needs another quarter or two.

Sources: Microsoft TechCommunity Azure Developer Community Blog, Microsoft Learn — Foundry IQ

The Failure Mode Naive RAG Cannot Survive

The MCP Endpoint Strategy That Makes This Cross-Framework Usable

The Tradeoffs the Preview Label Is Hiding

The ACL Enforcement Question Nobody Likes to Ask

What to Steal and What to Wait For

Sign up for more like this.