nvidia

NVIDIA’s AI Stack Is Splitting Between Token Factories and Physical AI

Anatoliy Kolodkin

22 May 2026 • 5 min read

Awards are usually the least interesting part of a technology announcement. They are badges on the box, not the architecture inside it. NVIDIA’s GTC Taipei and COMPUTEX update is worth reading past the award language because the winners tell a cleaner story than the press framing does: NVIDIA is pushing AI infrastructure outward in two opposite directions at once.

On one side is Vera Rubin NVL72, a rack-scale system designed for reasoning models, agentic AI, long-context inference, and token-factory economics. On the other side are Jetson Thor and Alpamayo, aimed at robots, autonomous vehicles, medical systems, industrial machines, and other places where intelligence has to operate near the physical world. That split is the real announcement. “AI workload” is no longer a useful singular noun.

The headline items are straightforward. NVIDIA says Vera Rubin NVL72 won a COMPUTEX Best Choice Golden Award and a Sustainable Tech Special Award. Jetson Thor also won a Golden Award. Alpamayo won in the vehicle technology and smart cockpit category. Fine. The useful question is why these three products belong in the same story at all.

The token factory is becoming a physical object

Vera Rubin NVL72 is NVIDIA’s next rack-scale argument for AI factories. The system connects 36 NVIDIA Vera CPUs and 72 NVIDIA Rubin GPUs using sixth-generation NVLink Switch, with ConnectX-9 SuperNICs, Spectrum-X Ethernet Photonics co-packaged optics switches, and BlueField-4 DPUs around the edges. That is not a “GPU server” in the way most developers still use the phrase. It is a tightly coupled inference and training appliance where CPU orchestration, networking, storage offload, cooling, power delivery, and serviceability are part of the product.

NVIDIA claims Vera Rubin NVL72 can deliver up to 10x higher inference performance per watt and 10x lower cost per token. It also says the system, paired with NVIDIA Groq 3 LPX, can reach up to 35x higher throughput per watt for trillion-parameter models. Treat vendor performance claims with the usual review posture: useful directionally, not a substitute for workload-specific benchmarks. But the metric choice matters. NVIDIA is not leading with raw FLOPS. It is leading with cost per token and throughput per watt because inference economics are now the bottleneck that model API users eventually feel as latency, rate limits, quotas, and price.

The engineering details around the rack are more interesting than the trophy. NVIDIA says Vera Rubin NVL72 uses a cable-free, hose-free, fanless modular tray design that cuts compute-tray assembly time from two hours to five minutes. Power shelves provide 6x more onboard energy storage for smoothing. The system is 100% liquid-cooled and operates at 45°C, enabling ambient-air dry-cooler designs in liquid-cooled data centers.

That is the unglamorous systems work behind “agentic AI.” Reasoning models and long-context agents are not just model problems. They are scheduling problems, memory problems, network problems, thermal problems, token-accounting problems, and operations problems. If a provider cannot keep the rack fed, cooled, repaired, routed, and utilized, the model’s benchmark score is academic. Builders consuming hosted models do not need to own those racks, but they should understand the constraint chain. It is why two providers running “the same model” can have very different behavior under load.

Jetson Thor is the reminder that edge AI is not small cloud

Jetson Thor points in the other direction. NVIDIA says the Blackwell-based module delivers up to 2,070 FP4 teraflops, 7.5x the compute, and 3.5x the energy efficiency of Jetson Orin, configurable between 40W and 130W. Those numbers are aimed at robots, medical devices, industrial systems, embedded autonomy, and developer kits where the deployment target is not a region, a Kubernetes cluster, or a model endpoint. It is a machine with a thermal envelope and consequences.

This is where a lot of cloud-native AI thinking breaks down. Edge AI is not “take the cloud model and make it smaller.” A warehouse robot, vehicle subsystem, inspection camera, or surgical-assist device has different failure modes than a chatbot. It may need offline behavior. It may need deterministic safety controls. It may need local perception because network latency is not a budget line item; it is a hazard. It may need updates that roll out like firmware, not a web deploy. It may need observability that survives bad connectivity and still gives developers enough signal to debug rare events.

For practitioners, Jetson Thor’s useful message is architectural: decide early which parts of your AI system must run locally and which parts can live in an AI factory. Local does not only mean cheaper or more private. It means lower latency, tighter control, and stricter operational discipline. Cloud does not only mean bigger models. It means easier iteration, central governance, and better amortization of expensive compute. The winning design is usually a loop, not a side. Edge systems collect signals and act under constraints; centralized systems train, simulate, evaluate, optimize, and push updates back out.

Alpamayo is NVIDIA’s bet that physical AI needs reasoning, not just perception

Alpamayo rounds out the story. NVIDIA describes Alpamayo 1.5 and Alpamayo 1 as 10-billion-parameter chain-of-thought reasoning vision-language-action models for autonomous-vehicle research, alongside AlpaSim simulation and NVIDIA Physical AI Open Datasets with more than 1,700 hours of driving data. The examples NVIDIA highlights are the hard ones: ambiguous pedestrian signals, contradictory traffic lights and road markings, emergency vehicles partially blocking lanes. In other words, the long tail where pure pattern matching and brittle rules get embarrassed.

There is an important caveat here. Chain-of-thought branding does not magically make an autonomous system safe, interpretable, or certifiable. Physical AI needs more than a model that can narrate plausible reasoning. It needs simulation coverage, dataset provenance, scenario replay, safety cases, monitoring, fallback behavior, and a way to prove that a change improved the system rather than merely changed its explanations. The cost of a bad answer is very different when software is connected to actuators.

Still, the direction is right. The next phase of robotics and autonomous systems will not be solved by perception alone. These systems need to reason over context, intent, conflicting cues, and rare events. That pushes developers toward multi-layer architectures: perception models, planning logic, world models, simulation environments, policy constraints, and audit trails. Alpamayo is interesting less as a standalone model launch and more as evidence that NVIDIA wants the physical-AI stack to look like a full development platform: data, model, simulator, hardware, and deployment path.

The community reaction so far appears muted. Search results and social threads around the COMPUTEX update are still dominated by NVIDIA finance, earnings, and investor framing. That is predictable and mildly unfortunate. The practitioner story is not that NVIDIA won more awards. It is that the company’s product stack now assumes AI will be produced in centralized factories, deployed into edge machines, and improved through simulation and telemetry between the two.

If you are building with AI in 2026, the actionable takeaway is to stop treating deployment target as an implementation detail. For cloud inference, measure time-to-first-token, p95 and p99 latency, tokens per dollar, model-version stability, retry behavior, queueing, and governance hooks. For edge systems, measure power, thermals, offline behavior, model update safety, local observability, and fallback paths. For physical AI, add simulation coverage, scenario regression tests, data lineage, and human review of rare-event behavior. The architecture is the product.

My read: Vera Rubin, Jetson Thor, and Alpamayo are not three separate COMPUTEX bullet points. They are NVIDIA’s two-front AI strategy in miniature. Vera Rubin targets the token factory. Jetson Thor and Alpamayo target intelligence at the edge, where models meet machines and the world refuses to behave like a benchmark. The common denominator is not “more GPUs.” It is designing the whole system around where intelligence is produced, served, acted on, measured, and corrected.

Sources: NVIDIA Blog, NVIDIA Vera Rubin NVL72, NVIDIA Jetson Thor, NVIDIA Alpamayo, NVIDIA Physical AI Open Datasets

The token factory is becoming a physical object

Jetson Thor is the reminder that edge AI is not small cloud

Alpamayo is NVIDIA’s bet that physical AI needs reasoning, not just perception

Sign up for more like this.