xai

xAI’s Quiet API Docs Refresh Is About Cost, MCP, and Enterprise Controls

Anatoliy Kolodkin

24 May 2026 • 5 min read

The most important xAI developer news this week is not a benchmark, a mascot mode, or another screenshot of Grok being spicy. It is a docs refresh full of things that sound boring until you have to run an AI product in production: cost accounting, MCP allowlists, API-key ACLs, rate limits, and audit logs.

xAI’s developer sitemap shows a broad documentation update at 2026-05-24T23:22:43.035Z, covering cost tracking, Remote MCP tools, the Management API, pricing, and model pages. The pages themselves do not expose clean publish dates, but the sitemap lastmod is fresh and the source is primary xAI documentation. More importantly, the substance is exactly the kind of operational surface Grok needs if it wants to be evaluated beside OpenAI, Anthropic, and Google as infrastructure rather than as a chatbot with better distribution.

Per-request cost is a real developer primitive.

The standout field is cost_in_usd_ticks. xAI says every inference response includes it in the usage object across chat completions, the Responses API, image generation, and video generation. It is not an estimate bolted on later. It is the actual billed amount after discounts, including prompt caching reductions, token costs, and server-side tool invocation costs.

The unit is delightfully unglamorous: 1 USD = 10,000,000,000 ticks. xAI’s examples convert 37756000 ticks to about $0.0038 and 200000000 ticks to $0.02. Streaming responses carry a running cost total, with the final chunk reflecting the final request cost. Server-side tool requests roll model decodes and tool invocation costs into the returned value, with fields such as server_side_tool_usage and num_server_side_tools_used helping explain what happened.

This is not just accounting trivia. Most LLM cost management begins as spreadsheet archaeology: estimate tokens, multiply by listed prices, reconcile later against a dashboard, then argue about why the app logs and the invoice do not match. Per-response actual cost lets teams attach spend to the unit that matters: a user action, a workspace, a customer workflow, an agent run, a web-search loop, an image render, or a video generation. That changes cost from a finance afterthought into product telemetry.

It matters even more for agents. A plain chat completion has a legible cost curve. An agent with web search, X search, code execution, MCP tools, long context, image understanding, and retries does not. Tool counts vary by prompt, model, user, and day. xAI’s returned cost field gives developers a clean logging pattern: request ID, tenant, user, task type, model, tool usage, latency, outcome, and cost_in_usd_ticks. Now you can ask the question that actually matters: did this agent run create enough value to justify what it burned?

The pricing page makes the economics concrete. grok-4.3 is listed with a 1M context, $1.25 per 1M input tokens, and $2.50 per 1M output tokens. grok-build-0.1 is listed at 256k context, $1.00 per 1M input, and $2.00 per 1M output. Tool calls are where agent budgets can quietly leak: Web Search, X Search, and Code Execution are each $5 per 1,000 calls; Collections/File Search is $2.50 per 1,000 calls; File Attachments are $10 per 1,000 calls. Grok Imagine image generation is listed at $0.02 per image, the quality image mode at $0.05 per image, and video at $0.050 per second.

One implementation footnote is worth flagging: xAI says the Vercel AI SDK’s @ai-sdk/xai integration does not currently surface cost_in_usd_ticks. If cost observability matters — and it should — teams need to use the OpenAI SDK or raw REST path until wrappers catch up. Abstractions are useful right up until they hide the field your CFO asks about.

MCP support needs permission boundaries, not vibes.

The Remote MCP docs are the sharper governance story. xAI supports Remote MCP Tools through the native SDK, the OpenAI-compatible Responses API, and the Voice Agent API, with Streaming HTTP and SSE transports. Configuration includes server_url, server_label, server_description, allowed_tools, authorization, and headers.

The important sentence is the warning: if allowed_tools is omitted, all tool definitions exposed by the MCP server are automatically injected into model context and available to the model. That is convenient during prototyping and irresponsible as a production default. Every MCP server that exposes write operations, admin actions, payment APIs, deploy hooks, customer-data access, or file mutation is a permission boundary. Letting the model see every tool because it was easier than writing an allowlist is how demos become incidents.

xAI’s own docs recommend allowed_tools for better performance and reduced risk, including restricting access to read-only operations. Good. The missing piece is equally important: xAI says OpenAI Responses API parameters such as require_approval and connector_id are not currently supported for Remote MCP. That means teams cannot assume provider-side MCP equals provider-side approval safety. If a Grok-powered agent can perform high-risk tool calls, approvals need to live in the application’s own broker: intercept the action, classify the risk, require human confirmation for destructive operations, and default to read-only capabilities until proven otherwise.

The Management API fills in the enterprise control plane. It uses a separate management key and base URL at https://management-api.x.ai, and supports API-key create/list/update/delete, ACLs by model and endpoint, QPS/QPM/TPM limits, propagation checks, model listing, endpoint ACL listing, management-key validation, and audit logs. ACL examples include api-key:model:*, api-key:endpoint:*, api-key:endpoint:chat, api-key:endpoint:image, and model-specific scopes such as api-key:model:grok-4.3. Audit logs expose team events with pagination and filters including userId, query, eventTimeFrom, and eventTimeTo.

That is the checklist platform teams ask for after the prototype phase. Issue narrow keys per service. Avoid wildcard ACLs outside sandboxes. Restrict each key to the models and endpoints it actually needs. Set QPS, QPM, and TPM caps. Validate propagation. Pull audit events into the same SIEM or internal security log stream used for other infrastructure. Treat image, video, and voice generation as separate budget and policy surfaces. Capture per-request cost from day one.

Compared with Anthropic and OpenAI, xAI is still catching up on some enterprise vocabulary. Anthropic’s Usage and Cost Admin API is stronger for historical organization-wide reporting by model, workspace, service tier, and time bucket. OpenAI’s production guidance has years of maturity around projects, key safety, usage tracking, rate limits, and spend limits. xAI’s differentiator in this refresh is the per-response actual cost field, especially if it consistently includes tool loops. That is genuinely developer-friendly.

The broader read is simple: enterprise AI competition is moving from “which model is clever?” to “which runtime can be budgeted, permissioned, audited, and revoked?” Grok still has to prove trust, adoption, and workload fit. But this docs refresh is the right kind of boring. Builders do not adopt platforms because the docs are exciting. They adopt them because the docs reveal the provider has thought about billing, permissions, limits, auditability, and failure modes before the customer’s first incident.

Sources: xAI developer documentation, xAI Remote MCP docs, xAI Management API guide, xAI pricing, xAI models, Anthropic Usage and Cost Admin API, OpenAI production best practices

Per-request cost is a real developer primitive.

MCP support needs permission boundaries, not vibes.

Sign up for more like this.