xAI Just Signed Up for Government Preview — Before You Ship

Something unusual happened in Washington on May 5, and it did not come with a demo video or a新闻发布会. The Center for AI Standards and Innovation at the Department of Commerce announced that xAI — alongside Google DeepMind and Microsoft — had signed agreements giving the federal government pre-release access to unreleased AI models for national security evaluation. xAI is now the third major AI lab (after OpenAI and Anthropic in 2024) to submit to what amounts to a government preview program before anything ships to the public.

The timing is not accidental. The agreements were accelerated by Anthropic's Mythos model, announced April 7, which discovers zero-day vulnerabilities in every major operating system and browser — finding, by one estimate, tens of thousands of vulnerabilities that remain unpatched. Dario Amodei described it as a "moment of danger": a narrow window for firms, governments, and banks to fix what Mythos found before malicious actors can exploit it. Mythos was so alarming that it prompted the US Treasury secretary to convene a meeting with Goldman, JPMorgan, and Citi. Anthropic has since limited Mythos to a small group of partner companies and launched Project Glasswing, a collaborative effort with Google and banks to test the model against critical infrastructure. The xAI deal is a downstream consequence of that alarm.

Here is what the announcement actually says, and what it means for anyone building on top of xAI's API.

What CAISI Actually Does With Your Model

The NIST press release contains a sentence that deserves more attention than it got: "To thoroughly evaluate national security-related capabilities and risks, developers frequently provide CAISI with models that have reduced or removed safeguards." This is not standard pre-release red-teaming where a lab hands over its production model for internal testing. CAISI gets stripped-down versions — models with safety guardrails removed — specifically so evaluators can probe for national security risks that a normal model would refuse to touch.

CAISI has completed more than 40 such evaluations to date, including on models that have never been released publicly. The TRAINS Taskforce — an interagency group of government experts — participates in these evaluations and provides feedback. The agreements also support testing in classified environments. This is a meaningful expansion of scope from the 2024 Biden-era agreements, which focused more narrowly on developing AI tests, definitions, and voluntary safety standards.

Under Secretary Howard Lutnick's direction, CAISI has been designated as "industry's primary point of contact within the U.S. government" for AI testing and security research. That is a more assertive posture than the previous administration took, and it reflects the urgency created by Mythos.

The Anthropic Exclusion Is the More Interesting Detail

The Pentagon announcement from the previous week provides crucial context that the May 5 CAISI story does not fully surface: Anthropic was explicitly excluded from the Defense Department's agreements to deploy AI on classified networks. The reason, per Reuters: an ongoing dispute over guardrails on military AI use. Anthropic has been more restrictive about how its models can be applied in defense contexts — it has policies about what Claude will and won't do in military applications. The Pentagon apparently found those policies incompatible with what it wants to do on classified infrastructure.

This creates an unusual competitive dynamic. Anthropic has the most concerning model from a national security standpoint — Mythos is the direct reason the CAISI expansion happened — but it is being locked out of certain government engagements because of its safety stance. xAI, which has had its own controversies over Grok's content guardrails (or lack thereof), is signing agreements that give the government more access. The trade is straightforward: more government access in exchange for being inside the tent when evaluations happen.

For practitioners, this raises a question that is not answered by any of the coverage: if CAISI is evaluating unreleased xAI models before they ship publicly, what does that mean for the API roadmap that developers plan against? If Grok 4.4 or Grok 5 goes through a six-week national security review before release, the gap between what is announced and what is available in the API could widen. That is a different kind of release risk than the industry has had to plan for.

xAI's Silence Says Something

The Reuters article notes that "xAI did not immediately respond to a request for comment." Google declined to comment. Microsoft provided a statement. xAI's non-response is notable: this is the third major lab to sign this type of agreement, and unlike OpenAI and Anthropic, which have been through the 2024 process, xAI did not appear to volunteer this as a proactive PR moment. The announcement came from NIST, not from xAI's communications team.

That could mean several things. Maybe xAI's communications team is smaller and slower. Maybe the deal was more politically negotiated than the other two. Or maybe — and this is speculative but worth considering — xAI's leadership did not want to draw attention to the fact that the federal government now has a formal window into what xAI ships before developers do. Either way, the silence from xAI is a data point about how the company thinks about transparency with its developer community.

What This Means for Grok API Developers

The practical implication is concrete, even if it does not change anything about how you write code today. The federal government is now a stakeholder in xAI's release pipeline. That does not mean every Grok model update goes through a months-long review — CAISI has completed 40+ evaluations, many of them on unreleased models, which suggests a parallel-track process rather than a gate that blocks shipping. But it does mean there is now a structured government interest in what xAI ships and when.

The more immediate consideration is the precedent. xAI has now acknowledged, through this agreement, that pre-release government review is a legitimate part of the AI development process. That is a position that frontier labs have been moving toward since the 2024 Biden-era agreements, but it is still not universal. If you are evaluating whether to build on Grok versus a competitor that has not signed a CAISI agreement, the calculus now includes a question you could not have asked six months ago: does that lab have any equivalent government review commitment?

The Mythos-catalyzed urgency also suggests this is not a one-time event. CAISI is actively expanding its evaluation capacity under Lutnick's direction. More labs will be asked to sign. The question for the industry — and for developers who depend on API stability — is whether this becomes a formal pre-market gate or remains an informal pre-release collaboration. The difference matters enormously for how predictable the release pipeline stays.

The Irony of xAI Inside the Tent

xAI signing a government pre-review agreement while being simultaneously sued by OpenAI — the same OpenAI that is also inside the CAISI tent — has a certain structural irony that is hard to miss. Musk spent April 30 in a courtroom admitting that xAI distilled OpenAI's models to train Grok. Now xAI is inside the US government's AI review process while OpenAI is also inside. The competitive dynamics of the frontier AI race are increasingly also regulatory dynamics, and the companies that can manage both simultaneously will have an advantage over those that cannot.

For builders, the message embedded in this announcement is simple: the federal government is now a participant in the AI release cycle, not a bystander. Whether you think that is good or bad for safety, it is real, and it is happening now. xAI is the latest lab to acknowledge it. Assume others will follow.

Sources: Reuters, NIST/CAISI, The Guardian, CNBC