nvidia

NVIDIA’s New Ethernet Pitch Is Not ‘Faster Pipes.’ It Is Fewer Wasted GPUs.

Anatoliy Kolodkin

07 May 2026 • 4 min read

NVIDIA’s latest Ethernet announcement is easy to misread as another throughput flex. It is not. Multipath Reliable Connection, or MRC, is really about avoiding the most expensive failure mode in modern AI infrastructure: synchronized training jobs waiting while a network fabric tries to remember how to behave.

That sounds unglamorous until you price it. A frontier training cluster is not a web app where one slow request gets retried and everyone moves on. When thousands or tens of thousands of GPUs are marching through tightly coordinated collective operations, a tail-latency event in the network can become a utilization tax across the job. The business metric is not packets per second. It is wasted GPU-seconds.

NVIDIA says MRC is now part of Spectrum-X Ethernet and has been published as an open specification through the Open Compute Project. The spec was developed with OpenAI, Microsoft, AMD, Broadcom and Intel, which matters because “Ethernet for AI” only becomes credible if it can escape single-vendor slideware. OpenAI says the transport is already deployed across its largest NVIDIA GB200 supercomputers, including Oracle Cloud Infrastructure’s Abilene AI factory and Microsoft Fairwater, and has been used to train multiple OpenAI models.

The interesting part is not the port speed

The technical move behind MRC is packet spraying across many paths without forcing the system to pretend packets will arrive in order. MRC packets include their final memory address, which allows the transport to distribute traffic across hundreds of available paths and tolerate out-of-order delivery. That is the useful break from the older mental model: pick a path, preserve order, hope the fabric does not develop a hot spot exactly when the job needs predictability.

OpenAI’s numbers make the scale concrete. It says a 64-port 800Gb/s switch can be split into 512 100Gb/s plane connections, enabling roughly 131,000 GPUs to be fully connected with two tiers of switches instead of a conventional three- or four-tier design. Fewer tiers are not just cleaner diagrams. They mean fewer hops, fewer places for congestion to form, less cabling complexity, and a better chance that the fabric’s failure behavior can be understood by humans before something catches fire at 2 a.m.

The failure story is the real pitch. NVIDIA says Spectrum-X failure bypass can detect a failed path and reroute traffic in microseconds in hardware. OpenAI reports seeing multiple tier-0 to tier-1 link flaps per minute during training with no measurable impact on synchronous pretraining jobs. More strikingly, OpenAI says it rebooted four tier-1 switches during a recent ChatGPT/Codex frontier-model training run without coordinating with the training team.

That sentence is doing a lot of work. In ordinary infrastructure, rebooting multiple switches during a critical production workload without coordination sounds like a résumé-generating event. In the environment MRC is trying to create, it becomes a maintenance exercise the workload can absorb. That is the difference between a network fabric that merely has redundancy and a training system that can actually use that redundancy under pressure.

AI networking is becoming workload-aware

Traditional dynamic routing can take seconds or tens of seconds to settle after failures. For many enterprise systems, that is annoying but survivable. For large synchronous training, it is an eternity. The GPUs do not politely find other useful work while the route converges. They wait, and the bill keeps running.

MRC changes the control boundary. Instead of leaving all route intelligence inside the fabric, it pushes more decision-making toward the host and NIC. SiliconANGLE’s Zeus Kerravala framed this as extending the routing “brain” toward the workload, letting sophisticated tenants influence routing and failure behavior even when they do not own the full data-center network. That is strategically important for AI clouds, where the customer may own the training job but not the physical fabric underneath it.

There is an obvious NVIDIA angle here. Spectrum-X becomes more than “Ethernet, but fast.” It becomes an implementation path for a transport that understands AI training’s intolerance for stragglers. ServeTheHome’s coverage landed on the same practical point: this is more than a whitepaper because it is already running in OpenAI, Microsoft and Oracle-scale environments. That production proof matters more than another benchmark bar chart.

For practitioners, the takeaway is not that every cluster now needs MRC tomorrow. Most teams are not operating 100,000-GPU fabrics. The useful lesson is what questions to ask when buying or designing AI infrastructure. Does the network support multipath RDMA in a way that handles reordering cleanly? Where do congestion decisions happen? Can the host see enough telemetry to make useful choices? How quickly does failure bypass occur in hardware? What happens during planned maintenance, not just synthetic failover tests? Has the vendor demonstrated the system under real link flaps and switch reboots while training jobs keep running?

Those questions are better than asking whether the fabric is “Ethernet” or “InfiniBand” as if the label settles the architecture. At gigascale, generic Ethernet is not the product. Workload-aware Ethernet is. The distinction matters because AI traffic is not generic data-center traffic. Collective communication patterns create synchronized bursts, recurring hot spots and painful sensitivity to the slowest participant. A network that looks fine under ordinary load can still be a budget leak during training.

The caveat is worth stating: an open spec does not automatically create a commodity market. NVIDIA gains ecosystem legitimacy by putting MRC through OCP with AMD, Broadcom and Intel involved, but Spectrum-X is still the most optimized path NVIDIA wants buyers to take. Cloud operators should welcome the openness while verifying implementation details. “Supports MRC” will not mean the same thing across NICs, switches, telemetry stacks and failure-handling policies.

The larger story is that NVIDIA is no longer selling the GPU as the center of the AI factory. It is selling the absence of idle GPUs. That requires memory, optics, switching, transport protocols, host telemetry and operational behavior that degrades gracefully when hardware fails. MRC is one piece of that stack, but it is a revealing one: the frontier infrastructure game has moved from raw speed to utilization preservation.

The headline, then, is not that NVIDIA made Ethernet faster. It is that AI networking is becoming workload-aware infrastructure, because at gigascale every network hiccup shows up as wasted GPU budget. Faster pipes are nice. Fewer stranded accelerators are what the CFO notices.

Sources: NVIDIA Blog, OpenAI, Open Compute Project, ServeTheHome, SiliconANGLE

The interesting part is not the port speed

AI networking is becoming workload-aware

Sign up for more like this.