ai-frameworks

Business Observability Is the Missing Layer Between Agentic AI Spend and an Unreadable Cloud Bill

Anatoliy Kolodkin

26 May 2026 • 6 min read

Agentic AI has a cost problem, but not the one most teams are watching. The visible problem is token spend: a dashboard with model names, request counts, context lengths, and an increasingly rude invoice. The more important problem is attribution. If a company cannot tell which agent spent the money, which customer or product it served, which tools it touched, and what value it produced, the bill is not merely high. It is unreadable.

That is why Kin Lane’s argument for “business observability” in The New Stack is worth taking seriously. The phrase sounds consultant-adjacent, but the underlying point is practical: technical observability tells you whether systems are up, slow, broken, or expensive. It rarely tells you whether a given API call, MCP tool invocation, model inference, or token-heavy workflow mattered to the business. Agentic systems make that gap harder to ignore because they multiply calls across APIs, models, tools, SaaS apps, vector stores, and internal services while hiding the fan-out behind a friendly chat box.

Lane’s blunt line is the one platform teams should tape to the FinOps dashboard: “No one’s been calculating the cost.” Not the raw spend — everyone eventually notices that. The missing calculation is cost joined to purpose: customer sector, product domain, support workflow, sales motion, internal function, cost center, data sensitivity, and revenue line. Without that join, agent observability stops at “this run used 84,000 tokens and called nine tools.” Useful, but not enough to decide whether the system should exist.

Token counts are not accountability

The AI observability market has spent the last two years getting better at traces, prompts, spans, model parameters, latency, token usage, and evals. Good. Keep all of that. But traces without business context are like server metrics without service ownership. They help debug the machine, not govern the investment.

A support agent that spends $14,000 in a month may be a bargain if it deflects enterprise tickets, reduces time-to-resolution, and escalates correctly. A coding agent that spends $2,000 may be waste if it generates noisy pull requests nobody merges. A sales-research agent that looks cheap per run may become expensive when it calls enrichment APIs, vector search, browser automation, and CRM tools in loops. The model bill alone does not answer the real question: what outcome did this activity buy?

Lane’s proposed mechanism is not magic. It is structured metadata — business-context tagging propagated through HTTP headers, API gateways, model calls, tool invocations, and telemetry pipelines. He compares the need to UTM parameters in marketing: imperfect, sometimes abused, but valuable because they made attribution part of the operating system. AI platforms need a similar discipline, except the tags should describe product domain, customer tier, workflow, cost center, environment, data class, and business purpose rather than campaign names.

This is where agentic AI collides with old API governance. Lane’s other useful line is that “MCP is just an API — a long-lived HTTP connection serving up JSON.” That is reductive in the right way. Model Context Protocol changes who consumes the interface and how dynamically it is discovered, but it does not repeal the need for ownership, authorization, documentation, versioning, rate limits, metadata, and cost attribution. If anything, MCP makes those requirements more urgent because the caller may be an agent that chains tool calls faster than a human developer ever would.

MCP sprawl is API sprawl with better branding

The current agent-stack pattern is familiar: teams expose internal capabilities as MCP servers, add them to a coding agent or workflow runner, and celebrate that the model can now “use tools.” That is useful. It is also how organizations create a second API estate before finishing governance of the first one. Lane warns that companies have “unleashed all these MCP servers” without a documentation solution, creating “a whole wave of API sprawl that we can’t see, but we have to support and sustain.” That is the part everyone should hear before the platform garden turns into a plugin swamp.

An MCP server without business metadata is another opaque integration. An MCP server with ownership, tool scopes, identity boundaries, cost labels, purpose fields, and data classification can become part of an auditable runtime. The distinction matters because agents do not merely retrieve context; they make decisions about which context and tools to use. If the control plane cannot answer who used which tool, under whose authority, for which workflow, and at what cost, “agent autonomy” becomes a euphemism for unbounded operational ambiguity.

FOCUS, the FinOps Open Cost & Usage Specification, gives this conversation a standards anchor. It describes itself as an open specification that normalizes billing datasets across AI, cloud, SaaS, data center, and other vendors so FinOps teams can reduce complexity. That is exactly the direction agentic spend needs to go. Token usage should not live in a boutique AI dashboard while cloud, SaaS, and data costs live somewhere else. The whole stack participates in the run: model inference, gateway routing, vector storage, SaaS APIs, browser sessions, observability, security scans, and human review queues.

OpenAPI is the other useful precedent. The OpenAPI Initiative frames API specifications as artifacts that support design, infrastructure configuration, developer experience, testing, and security tooling. Agent-readable APIs and MCP servers need comparable discipline. Descriptions are not just for human docs anymore; they shape what agents can discover, understand, and call. Bad descriptions create bad behavior. Missing metadata creates ungoverned behavior.

The vocabulary is the hard part

Adding a header is easy. Agreeing on what the header means is where organizations discover whether they actually understand themselves. Product domains, customer tiers, support categories, revenue attribution, legal constraints, and data-sensitivity classes are social contracts before they are telemetry fields. If engineering invents them alone, they become operational labels nobody in finance or product trusts. If business teams invent them without platform constraints, they become PowerPoint categories that do not map cleanly to systems.

The practical answer is shared ownership. Platform teams should define how metadata is propagated, validated, logged, redacted, and queried. Domain owners should define the business vocabulary for their bounded contexts. Security should define data sensitivity and authority boundaries. Finance should align cost-center and allocation fields with normalized billing exports, ideally in a FOCUS-compatible direction. Product and support should say which outcomes matter. None of this sounds like prompt engineering because it is not. It is operating-model engineering.

There is also a privacy trap. Business context is valuable for accounting and governance, but that does not mean every label belongs in the model prompt. Full customer attributes, revenue details, legal classifications, and internal cost-center data should live primarily in telemetry, policy, and routing systems. The model should receive only the minimum context it needs to perform the task. Otherwise “business observability” becomes another way to leak sensitive metadata into traces, prompts, tool calls, and vendor logs.

What builders should implement before the bill gets weird

Start with inventory. List every agent-accessible API, MCP server, model gateway, workflow runner, and tool bridge. Assign an owner to each. Separate read and write capabilities. Require structured metadata for product domain, environment, cost center, workflow, user or service identity, data sensitivity, and intended purpose. Attach those labels at gateways and tool boundaries rather than trusting every agent prompt to remember them.

Then make the accounting observable. Log model usage, token counts, tool calls, API calls, latency, failures, approval events, and downstream SaaS/cloud consumption against the same run ID and business labels. Set budgets per workflow, not just per provider account. Alert on abnormal graph shapes: a support workflow suddenly calling sales-enrichment tools, a coding agent using external search during a private-repo task, or a low-value internal agent consuming premium models all afternoon. Cost control is much easier before the agent becomes a beloved mystery service nobody wants to shut down.

Finally, review MCP and API descriptions like production interfaces. Who owns the server? Which tools are write-capable? Which identities can call them? What data classes can flow through them? Are calls auditable? Are responses redacted before model context? Is there a deprecation path? Can finance attribute the spend? If those questions sound excessive for a demo, fine. But demos are not the problem. The problem is demos that grow roots.

The cloud analogy in Lane’s interview is deliberately sharp: everyone expected cloud bills to be cheaper, and fifteen years later many organizations are still explaining why the bill is ten times what they expected. AI could be worse because agentic systems can create demand on their own. The teams that survive that transition will not be the ones with the prettiest token dashboard. They will be the ones that can connect agent activity to business value before the invoice turns into archaeology.

Sources: The New Stack, FOCUS / FinOps Open Cost & Usage Specification, Model Context Protocol authorization specification, OpenAPI Initiative

Token counts are not accountability

MCP sprawl is API sprawl with better branding

The vocabulary is the hard part

What builders should implement before the bill gets weird

Sign up for more like this.