Grok 4.3 Lands in the API With a 40% Price Cut and a Push Into AI Agents

xAI shipped Grok 4.3 into the API on April 30, and the company is making no bones about what this model is for: AI agents. The release bundles two things that matter to builders — a meaningful jump in agentic benchmark performance and a roughly 40% price cut that puts Grok 4.3 in genuine contention with GPT-4.5 and Claude 3.7 on cost. The xAI docs now lead with Grok 4.3 as the default recommendation, calling it "the most intelligent and fastest model we've built." For developers evaluating AI APIs this week, Grok 4.3 belongs on the shortlist. Whether it stays there depends on how it performs on your actual workflow, not the published numbers.

The benchmark that should actually get your attention

The headline number most coverage will lead with is Grok 4.3's Intelligence Index score of 53 — four points ahead of Grok 4.20, placing it above Muse Spark and Claude Sonnet 4.6. Fine. But the number that matters for production builders is the GDPval-AA agentic benchmark: 1500 ELO, a jump of 321 points from Grok 4.20's 1179. That is not a marginal improvement. That is a structural step change in how well the model handles multi-step reasoning and tool-use in real-world task chains.

To put that in context: Grok 4.3 now surpasses Gemini 3.1 Pro Preview, Muse Spark, GPT-5.4 mini (xhigh), and Kimi K2.5 on agentic performance. It is still 276 ELO points behind GPT-5.5 (xhigh), with an expected win rate against GPT-5.5 of roughly 17% in direct agentic task head-to-heads. But surpassing the Mini and the Kimi while closing the gap on GPT-5.5 is a different competitive position than Grok held a month ago. If you are building agentic workflows today and your benchmark list does not include Grok 4.3, you are not benchmarking seriously.

Instruction-following also improved: Grok 4.3 hits 98% on tau-Bench Telecom, up 5 points from 4.20, putting it in line with GLM-5.1 on this metric. IFBench stays flat at 81%. Neither of those numbers is the story. The story is the agentic jump, and the fact that xAI appears to have actually shipped meaningful improvement in the capability that determines whether a model can reliably run your automation workflow without hallucinating the next step.

The price move is aggressive, but read the asterisk

At $1.25 per million input tokens and $2.50 per million output tokens, Grok 4.3 is meaningfully cheaper than GPT-4.5 and Claude 3.7 Sonnet on input-heavy workloads — long document analysis, code review, research pipelines where your prompts are large and your outputs are moderate. The 40% input price reduction from Grok 4.20 is real, and for applications where you are sending substantial context on every call, it compounds quickly.

But the xAI docs also reveal the tool-use cost layer that does not appear in the headline price. Server-side tools — web search, x search, code execution — are $5 per 1,000 calls. File attachment search is $10 per 1,000 calls. Collections search is $2.50 per 1,000 calls. If you are building an agent that searches the web, runs code, and processes files, your per-query cost is the token price plus the tool invocation layer. For simple single-turn requests, Grok 4.3 is competitively priced. For complex multi-tool agentic workflows, run the actual cost model before committing.

The Batch API — 20% to 50% off standard rates, most jobs completing within 24 hours — is worth knowing about if you have asynchronous workloads. That discount applies to all token types including reasoning tokens, which matters for longer reasoning traces. Not relevant for real-time applications, but a genuine cost advantage for batch processing pipelines.

The latency asterisk on "fastest"

xAI calls Grok 4.3 "the most intelligent and fastest model we've built." The Intelligence Index data shows Grok 4.3 ranks #1 in output tokens per second at 2.7 — that part of the "fastest" claim is legitimate. Output throughput for long-form generation is genuinely strong.

But Time to First Token on xAI's own API is 12.65 seconds, which xAI's own documentation notes is "at the higher end" compared to other reasoning models in the same price tier (median: 2.82 seconds). That is a 4.5x gap. For a coding assistant making quick suggestions, that lag is going to feel slow. For a batch agentic workflow where the model is running code, searching the web, and making decisions over minutes, it is probably acceptable. For anything with a real-time conversational expectation, benchmark TTFT carefully against your current choice before switching. "Fastest" in xAI's marketing refers to output throughput, not time-to-first-token. Know which one matters for your use case.

What this means for your architecture decisions

Grok 4.3's agentic benchmark jump changes the calculus for teams building production AI agents — not because the numbers are definitive (they are not; run your own benchmarks), but because the trajectory matters. Three hundred and twenty-one ELO points in one release cycle suggests xAI is investing in the multi-step reasoning stack specifically. If that trajectory holds, Grok 4.x could become the competitive option for agentic workflows within the next 2-3 release cycles.

The 1M token context window is also worth noting for builders working with large codebases or long documents. That is competitive with what Claude and GPT offer at the high end. Combined with the server-side tool infrastructure — web search, code execution, file search — xAI is explicitly positioning Grok 4.3 as an agentic platform, not a chatbot. The tools are priced and documented, which means xAI is expecting production use of them.

For teams already on xAI: Grok 4.3 should be your default model choice going forward. The price is better, the benchmarks are better, and the tool integration is more mature. The only reason to stay on 4.20 is if you have extensive production tuning on that specific model version and cannot absorb the behavioral changes that come with a major model upgrade.

For teams evaluating xAI for the first time: Grok 4.3 is the most compelling API offering xAI has shipped. But treat it like any new model choice — benchmark it against your actual workflow, not the published numbers, before committing to a production migration. The TTFT latency issue in particular deserves real-world testing before you rebuild a latency-sensitive application around it.

xAI shipped Grok 4.3 on the same day Elon Musk was on a witness stand admitting that xAI trained Grok partly on OpenAI's models. One story is about building competitive products at competitive prices. The other is about how those products were built. The distillation question is legally unresolved and will be litigated. The pricing and benchmark numbers are available today. For practitioners, the product story is what matters right now. Watch the latency issue if you care about real-time performance. Watch the tool-use cost model if you care about agentic workflow economics. And run your own benchmarks before trusting anyone else's.

Sources: xAI Developer Documentation, Artificial Analysis, OpenRouter