GPT-5.3-Codex-Spark Is Real — And It's Built for Speed

OpenAI's new GPT-5.3-Codex-Spark research preview runs on specialized hardware, hitting 1,000+ tokens/sec. What it means for Copilot's dominance.

GPT-5.3-Codex-Spark Is Real — And It's Built for Speed

OpenAI has quietly made a new model available to ChatGPT Pro users: GPT-5.3-Codex-Spark, a research preview running on specialized low-latency hardware purpose-built for the demands of real-time code generation. Unlike the steady parade of API model releases that load into documentation and stay there, this one landed without a blog post and started showing up in token rate cards before anyone had written a proper explainer. The silence was telling.

The case for specialized hardware is the key detail nobody should gloss over. Running a coding model fast isn't just a matter of throwing a existing GPT-5 variant onto a faster GPU cluster — the latency profile required for an interactive coding environment, where you need completions in under a second to feel natural, demands a different architectural approach. Reports from early users suggest throughput north of 1,000 tokens per second in some scenarios. That's not a marginal improvement over what's already shipping. That's a different performance tier, competitive with the inference speeds Cursor has been advertising as its core differentiator.

Codex-Spark isn't available through the standard API rate card yet — the research preview designation keeps it out of the published pricing table while OpenAI calibrates what to charge. That's a deliberate move. OpenAI is running a controlled experiment: get real users hammering on it, measure actual consumption patterns, and come out of preview with pricing that reflects how much compute it actually burns. The alternative — locking in rates before anyone knows how the hardware performs under production load — would be a recipe for either leaving money on the table or pricing it out of the market before it finds its audience.

For GitHub Copilot, this is an uncomfortable development. Microsoft's tool has held a strong position on inference speed for the past year, and its deep IDE integration (Visual Studio, VS Code, JetBrains) has given it structural advantages that raw model quality couldn't easily overcome. If Codex-Spark ships out of preview with both competitive speed and OpenAI's model development velocity behind it, the comparison between Copilot and Codex shifts from "which has the better default model" to "which stack do you trust more for your workflow." That's a harder question for Microsoft to answer with a changelog.

The timing matters too. OpenAI is in the middle of a push toward its unified superapp vision, with Codex as the coding layer of that story. A fast, specialized coding model that runs in the ChatGPT Pro environment — already the most widely subscribed AI coding tier — gives OpenAI a distribution advantage that Microsoft and GitHub can't easily match through third-party IDE integrations. The model comes to the user, rather than the user having to go to the model through a plugin installation.

The research preview label is worth keeping around for now. Early access programs at OpenAI have a pattern: the preview is genuine, the rates are real, and the model behaves well in most scenarios — but edge cases show up under production load that controlled testing misses. Anyone building critical infrastructure on top of Codex-Spark right now is an early adopter in the literal sense. That's fine for experimenting, prototyping, and getting a feel for the workflow. It's worth being deliberate before routing your sprint's entire refactor through it.

See current Codex pricing and model details at developers.openai.com →