ai-models

xAI’s May 15 Retirement Turns Grok 4.3 Into the Default — and Makes Silent Redirects a Cost Risk

Anatoliy Kolodkin

15 May 2026 • 5 min read

xAI’s May 15 retirement notice looks like routine API housekeeping until you read the fine print: the deprecated slugs do not just die. They keep resolving, silently redirect to grok-4.3, and get billed at Grok 4.3 rates.

That is a clever uptime move and a dangerous migration pattern. It means teams that miss the deadline may see green dashboards while their model behavior, reasoning settings, latency envelope, context assumptions, and unit economics have all changed underneath them. The HTTP response says “fine.” The production system may not agree.

Effective May 15, 2026 at 12:00 PM PT, xAI retired eight API model slugs: grok-4-1-fast-reasoning, grok-4-1-fast-non-reasoning, grok-4-fast-reasoning, grok-4-fast-non-reasoning, grok-4-0709, grok-code-fast-1, grok-3, and grok-imagine-image-pro. Requests to retired text models now route to grok-4.3. Requests to the retired image model route to grok-imagine-image-quality.

The mapping is explicit. Retired reasoning models are served by grok-4.3 with low reasoning effort. Retired non-reasoning models are served by grok-4.3 with none. grok-code-fast-1 also goes to grok-4.3, with xAI saying the newer model has “significantly improved agentic coding and web dev capabilities.”

The compatibility layer is doing two jobs at once

From xAI’s side, this consolidation makes sense. Model fleets sprawl. Every old slug creates documentation drag, support burden, benchmark confusion, routing complexity, and product ambiguity. A platform that wants developers to think “use Grok 4.3 for chat and coding” will eventually prune the tree.

The uncomfortable part is that silent redirects blur the line between backwards compatibility and behavioral migration. If an old model slug returned an error, every team would know they had work to do. If it keeps returning successful responses, the migration can hide inside normal traffic until cost, latency, or answer quality drifts enough for someone to notice.

xAI is at least clear about the pricing impact. The migration guide says redirected calls are billed at Grok 4.3 pricing: $1.25 per million input tokens and $2.50 per million output tokens. The current model table lists Grok 4.3 with a 1 million token context window at those rates. That may be a reasonable price for a capable frontier model. It is not necessarily reasonable for workloads that selected older fast slugs because they were cheap.

The most exposed systems are not glamorous coding agents. They are the boring high-volume paths: classifiers, routers, summarizers, customer-support triage, lightweight extraction jobs, synthetic data filters, and “cheap first pass” chains. Those workloads often tolerate a weaker model because the business metric is cost per acceptable decision, not maximum reasoning depth. Redirecting them to a better model can still be a regression if the economics change.

Reasoning effort is now an application setting, not a model personality

The reasoning_effort mapping is the part teams should not hand-wave. xAI supports none, low, medium, and high. The migration guide maps retired reasoning workloads to low and non-reasoning workloads to none. That is a sensible default. It is not a substitute for evaluation.

Reasoning effort changes more than cost. It can change latency, tool-call behavior, intermediate planning, verbosity, JSON reliability, refusal patterns, and retry dynamics. A production agent is rarely “model in, answer out.” It is model plus tools, prompt contracts, schema validators, retry loops, rate limits, observability, and human review thresholds. Move one piece and the whole harness deserves a regression test.

For teams that used grok-code-fast-1, the trap is especially obvious. “Improved agentic coding” sounds good, but coding agents are not judged by vibes. They are judged by accepted diffs, failed builds, test-pass rate, edit minimality, tool-call discipline, and whether they stop before rearranging half the repo because a prompt said “small fix.” If Grok 4.3 is better, great. Prove it against your actual task set.

A practical migration checklist is short and not optional. Replace deprecated slugs explicitly. Set reasoning_effort deliberately instead of inheriting redirect defaults. Run side-by-side evals on representative traffic. Track cost per successful outcome, not just token price. Log the resolved model and reasoning setting with every request. Alert when a provider alias resolves differently than expected. If your workflow needs reproducibility for audits, compliance, benchmarks, or customer commitments, pin dated model IDs where the provider offers them.

The broader signal: aliases are product convenience, not engineering truth

xAI’s model docs describe aliases as a way to migrate users to newer versions automatically. That is useful for prototypes and consumer-grade surfaces. It is risky for systems where output stability matters. The industry keeps relearning this lesson because aliases are seductive: latest feels like free improvement until your eval baseline moves, your safety case no longer matches production, or your invoice tells you the “same” model is not the same model.

This is not an xAI-only issue. Every AI platform wants fewer public surfaces and more traffic on its preferred default. OpenAI, Anthropic, Google, and xAI all have incentives to steer developers toward current models, current pricing, and current product narratives. Builders have a different incentive: predictable behavior under change. Those incentives are compatible only if the application treats model routing as configuration with tests, not as a magic string buried in code.

There is also a competitive read. Grok 4.3 is being positioned as the default for chat, coding, long-context, tool-calling, and agentic workflows. Microsoft Foundry’s Grok 4.3 preview note describes it as aimed at agent-based workflows, instruction following, multimodal analysis, web development, legal reasoning, finance agents, and enterprise productivity loops. That is the frontier-model playbook now: one capable general model, surfaced through many enterprise channels, with knobs for reasoning and context rather than separate branded models for every task.

That can be good for developers. Fewer models means easier documentation, simpler routing, and less time spent deciphering whether “fast,” “mini,” “pro,” “reasoning,” and “latest” are marketing, architecture, or billing categories. But simplicity at the platform layer often moves complexity into the application layer. Your system still needs to decide when to spend on deeper reasoning, when to route to a cheaper model, when to fall back, and when to stop the agent from burning tokens on a task that is already good enough.

The right reaction to the May 15 retirement is not outrage. xAI gave notice and documented the redirects. The right reaction is to treat it as a production migration, because that is what it is. A successful API call is not proof of semantic compatibility. A model that keeps resolving is not a model that stayed the same.

Grok 4.3 may be xAI’s better default. The hidden cost is pretending default migration is free. In AI systems, compatibility is not just an endpoint contract. It is behavior, latency, tool use, price, safety posture, and reproducibility. Ship the migration like code, not like a newsletter footnote.

Sources: xAI model retirement guide, xAI model pricing docs, xAI reasoning docs, Microsoft Foundry

The compatibility layer is doing two jobs at once

Reasoning effort is now an application setting, not a model personality

The broader signal: aliases are product convenience, not engineering truth

Sign up for more like this.