azure-ai

Microsoft’s Agent Stack Is Becoming an Identity, Context, and Cost Story — Not a Model Leaderboard

Anatoliy Kolodkin

05 Jun 2026 • 5 min read

Microsoft’s enterprise AI story is becoming less about model spectacle and more about a stack that sounds suspiciously like normal software architecture: identity, context, observability, cost controls, runtime boundaries, and distribution. That is good news for anyone tired of AI strategy being measured in leaderboard screenshots.

A fresh VentureBeat interview with Marco Casalaina, Microsoft VP of Products for Core AI and an AI Futurist, is useful because it connects the Build 2026 announcements into one operating model. Core AI, as he describes it, spans Foundry, Visual Studio, VS Code, GitHub, and GitHub Copilot. The platform story underneath is straightforward: model choice at the bottom, hosted agents and the Foundry control plane above it, Microsoft IQ products as the context layer, Entra as agent identity, and Copilot/Teams as the place where users actually encounter the work.

That is not as catchy as “new frontier model beats old frontier model by 3% on a benchmark nobody can explain.” It is much more likely to matter inside companies.

The agent platform is the product

Casalaina says Foundry offers access to models from OpenAI, Anthropic, Mistral, Black Forest, xAI, DeepSeek, Qwen, and Microsoft’s own MAI family. He also says Claude Opus 4.8 is available on Foundry and frames Microsoft’s MAI models around token efficiency, optimization, customization, fine-tuning, and continued pre-training. That breadth matters, but not because enterprises want a bigger dropdown. They want leverage: pick the right model for the workflow, route work by quality and cost, and avoid binding the whole AI program to one provider’s release calendar.

The rest of the stack is where Microsoft’s advantage is easier to see. Hosted agents give teams a runtime. Foundry control plane gives traces, evaluations, correctness signals, token trends, cost differences by model, and integration with Azure Cost Management for the surrounding storage, data, and compute spend. Copilot and Teams give distribution. Entra gives identity. Microsoft IQ products provide context. That is the platform bet: the winning enterprise agent will not be the cleverest standalone chatbot. It will be the one that can see the right information, act with the right authority, prove what happened, and appear where employees already work.

The numbers in the interview should be treated as vendor-attributed but still notable. Casalaina says Microsoft 365 Copilot has crossed 20 million users and that he personally uses Copilot roughly 50 times a day. He cites Bayer seeing a 6x increase in monthly active users over the last year for Copilot usage, and says 20,000 Bayer employees use the company’s own agent system on Foundry. He also points to AEMO, the Australian Energy Market Operator, using agents to triage grid-operation alerts by surfacing severity and prior resolutions while keeping humans in the decision loop.

The useful pattern in those examples is not “AI everywhere.” It is bounded workflow design. Bayer’s internal agent system and AEMO’s alert triage are specific, measurable, and operationally grounded. That is what good pilots look like. Bad pilots start with “connect Copilot to everything and see what happens,” which is how you get a great demo followed by a security review that lasts longer than the product roadmap.

IQ is Microsoft’s context bet, and MCP is the interface clue

The most important claim in the interview is that the IQ family is headless and exposed as MCP servers. Foundry IQ handles unstructured enterprise knowledge. Fabric IQ handles structured business data in Microsoft Cloud, Fabric, and Power BI. Work IQ brings Outlook, Teams, Word, SharePoint, and broader Microsoft 365 context. Web IQ covers agent-facing web search, video search, and browsing-style tasks. Casalaina calls MCP “basically an agent-facing or self-describing API,” with authentication layers and capabilities.

That framing explains a lot of Microsoft’s Build-era moves. The company is trying to make enterprise context addressable by agents without every product team separately scraping SharePoint, Outlook, Power BI, Teams, docs, and the public web. If the IQ layer works, a builder can compose agents against governed context services instead of building fragile connectors for every data source. That is the right abstraction.

It is also the most sensitive part of the stack. “Agentic face of all Microsoft apps” is a polite way of saying there is a new route into the company’s nervous system. Permissions, provenance, filtering, and logs have to be excellent. An agent that summarizes Teams threads, reads SharePoint, reasons over Power BI semantics, and searches the web is valuable precisely because it crosses boundaries humans usually navigate with judgment. The platform has to preserve those boundaries rather than flattening them into one irresistible context blob.

Agent identity is the other key differentiator. Casalaina describes agents having their own Entra identity, Teams presence, email inbox, and documents, and using Work IQ to access their own work context. That moves the mental model away from “assistant UI” and toward “governed actor.” A bot that answers questions is one thing. An agent with an inbox, org presence, documents, and delegated context is closer to a service account with social affordances. Useful, yes. Also something that deserves lifecycle ownership, access review, and offboarding.

Cost visibility is not FinOps theater anymore

The cost discussion is where this gets practical for engineering teams. Casalaina talks about token usage by day, week, and month; cost differences by underlying model; trends; traces; evaluations; correctness; and Azure Cost Management integration. That is the minimum observability layer for agents. It is not sufficient forever, but it is the right foundation.

Agents do not spend like simple chatbots. They read documents, call tools, search, retry, invoke other agents, run evaluations, process traces, and sometimes keep long-running sessions alive. Tokens are only one part of the bill. Storage, vector indexes, web search, model routing, code execution, private networking, logs, and human review time all show up eventually. If a team measures only raw token usage, it will optimize the wrong thing.

The better metric is unit economics per workflow: cost per resolved support ticket, cost per triaged alert, cost per drafted response accepted by a human, cost per code-review assist, cost per avoided escalation, and cost per bad action caught by evals or approvals. Foundry’s control plane can help if teams design pilots with those measures from the beginning. If they wait until usage spreads, the first serious cost discussion will be political instead of technical.

Rubric-based evaluation fits the same theme. Casalaina’s reservation-agent example is more useful than generic “groundedness”: did the agent ask for the missing time, check availability, and then book correctly? That is what real evals should look like. They should encode the workflow’s expected behavior, not merely ask a judge model whether the answer sounded plausible. For practitioners, the lesson is to write evals from process requirements, edge cases, and failure modes before celebrating adoption metrics.

The tactical guidance is simple: pick one bounded workflow, define its data boundaries, assign an agent identity, choose a model routing policy, log traces, write rubric evals, measure unit economics, and publish through the surface users already use. Do not start with the model. Start with the job. The model is a dependency; the workflow is the product.

Microsoft’s agent stack is not guaranteed to win just because the pieces are familiar. The IQ/MCP layer has to be understandable to builders and auditable to security teams. Cost reporting has to connect to business outcomes, not just prettier charts. Agent identity has to come with real lifecycle controls. But the direction is right: the enterprise AI contest is moving away from isolated demos and toward operational systems. Less leaderboard, more control plane. Finally, some architecture in the architecture.

Sources: VentureBeat interview with Microsoft’s Marco Casalaina, Microsoft Build 2026, Microsoft 365 blog — Scout personal agent, Microsoft AI — seven new MAI models, Foundry MCP documentation

The agent platform is the product

IQ is Microsoft’s context bet, and MCP is the interface clue

Cost visibility is not FinOps theater anymore

Sign up for more like this.