xAI Just Declared War on Its Own Legacy API
xAI did not send a press release. It did not post on X. It updated a docs page.
The page in question — docs.x.ai/developers/model-capabilities/text/comparison — carries a lastmod timestamp of April 27, 2026, and it says something that every xAI developer building on the Chat Completions API needs to hear: that API is deprecated. Not quietly discouraged. Not "we recommend the other one." Deprecated. The word is right there in the table comparing the two interfaces, in black text on a white page, between a column for "Responses API" and a column for "Chat Completions API."
This is the story that will matter most to xAI's developer base this week, and it shipped without a demo video, a launch post, or a single tweet from Elon. That is either intentional understatement or organizational happenstance. Either way, if you are running client.chat.completions.create in any xAI production system today, the migration clock is now running.
What the docs actually say
The comparison page is not subtle. It lists the Chat Completions API under a "Deprecated" header and describes the Responses API as the "recommended way to interact with xAI models." The deprecation is structural, not cosmetic. The parameter names are different. The mental model is different. The billing behavior is different.
Here is the concrete diff: messages becomes input. max_tokens becomes max_output_tokens. But those are the easy renames. The two new parameters — previous_response_id and store — have no equivalent in the old API at all. previous_response_id is how you resume a stateful conversation on the server side instead of repacking the full message history into every request. store is your opt-out switch if you do not want your conversation stored for 30 days.
The billing difference is where it gets expensive — literally. Chat Completions bills the full conversation history on every single turn. If you are running a 20-turn coding assistant interaction, you are paying for 210 tokens on turn 20 (the sum of all previous turns) plus your new output. The Responses API automatically caches conversation history server-side. Your turn-20 cost is just the new output tokens plus a small cache hit, not the full stack. For any production application with non-trivial session length, this is not a line-item optimization. It is an architecture change in your cost base. Teams running Chat Completions in production are almost certainly overpaying compared to what they would pay on the Responses API, and the old API never made that cost visible.
The reasoning content gap is equally stark. reasoning_content — the encrypted reasoning trace that makes xAI's reasoning models distinctive — is limited in the legacy API. Only grok-3-mini returns it. The Responses API provides full encrypted reasoning content support across all reasoning models, including grok-4.20, grok-4.20-reasoning, and grok-4.20-multi-agent. If you have been trying to get reasoning traces out of xAI's full-sized reasoning models and hitting a wall, the wall is the deprecated API surface, not a model limitation.
Then there are the agentic tools. Chat Completions supports basic function calling — you define a function schema, the model outputs a function name and arguments, you execute it, you send the result back. The Responses API supports server-side tools natively: web search, x search, code execution, and MCP integration, all configured server-side with no manual tool-calling loop on the client. That is not an incremental improvement. That is a different architecture for building autonomous agents.
The migration is not just a parameter rename
The xAI migration guide uses phrases that should get the attention of any engineering manager who has been through a deprecated-API transition before. It says xAI "will be phasing out older versions" and that deprecated models may be "transitioned to obsolete status and discontinued from serving." That is not "we might deprecate it someday." That is a 30-day eviction notice written in the passive voice.
The good new: the parameter mapping is genuinely well-documented. The comparison table maps every old parameter to its new equivalent. The quickstart page — also updated April 27 — now leads with grok-4.20-reasoning as the example model and shows code in Python (xAI SDK), Python (OpenAI SDK), JavaScript (Vercel AI SDK), JavaScript (OpenAI SDK), and cURL. That is a real quickstart, not a stub. The bad news: for anyone with a non-trivial Chat Completions integration, the migration requires rethinking how state is managed, how conversation context is preserved, and how billing is tracked. It is a surface-level API change that exposes a deeper architectural shift.
What OpenAI's migration actually taught the industry
OpenAI deprecated its Legacy Completion API — the one that took a text prompt and returned a text completion — years ago. The migration took longer than anyone expected, required extensive tooling from OpenAI, and was not fully complete for most large-scale users until the model distantly behind it was genuinely hard to access. The reason it dragged: the old API worked fine for most use cases, and the new one was incrementally better but not categorically different. The switching cost felt higher than the benefit for a long time.
xAI's situation is different in a critical way. The Responses API is not an incrementally better version of Chat Completions. It is a different product category. Stateful versus stateless. Tool-native versus function-calling manual loop. Full reasoning traces versus limited traces. The migration is not a find-and-replace on parameter names. For teams that have built production systems on Chat Completions, this is a rewrite of the conversation management layer, not a rename. That is a harder migration than OpenAI's, despite xAI's docs being clearer about what needs to change.
What practitioners should actually do
Start by auditing your current xAI integration. If you see client.chat.completions.create or /v1/chat/completions in your codebase, you are running on the deprecated API. Estimate your current session length and turn count. Run the numbers on what a stateful Responses API setup would cost on the same workload. The billing difference alone may justify the migration investment on pure ROI, before you even consider the feature gaps.
The migration path for simple use cases is straightforward: change messages to input, add model, switch your SDK endpoint. For anything with multi-turn conversation logic, you need to introduce previous_response_id handling — store the response ID from each turn, pass it to the next. That is new state management surface. The store: false parameter needs to be set if your use case has privacy constraints that prevent 30-day server-side storage. And if you need reasoning traces from anything other than grok-3-mini, you need to add include: ["reasoning.encrypted_content"] to your request — because the old reasoning_content parameter will not get you there on the bigger models.
The strategic context matters too. Deprecated APIs are maintenance burden, support burden, and security surface area. A company preparing for serious developer scale — and xAI has every incentive to do that, given the SpaceX ecosystem and the volume of inference it is running — cannot afford two divergent API surfaces indefinitely. The Responses API is where xAI's engineering investment is going. Chat Completions is where technical debt lives. That is not a guess. That is what the sitemap says when 106 pages of documentation gets rewritten to make one API the hero and one the legacy footnote.
Read the comparison page. Run your integration against the migration guide. The migration is not urgent yet, but it is not theoretical. And unlike most deprecated-API situations, this one comes with a billing incentive to switch faster rather than slower.
Sources: xAI Docs — Responses API vs Chat Completions Comparison, xAI Docs — Model Migration Guide, xAI Docs — Quickstart