Dell and NVIDIA Are Turning the AI Factory Into an Agent Runtime, Not Just a GPU Rack

Dell and NVIDIA Are Turning the AI Factory Into an Agent Runtime, Not Just a GPU Rack

The useful part of Dell and NVIDIA’s latest AI Factory pitch is not the rack glamour. It is the admission hiding under the keynote language: enterprise agents are becoming a runtime problem.

For the last two years, “AI infrastructure” mostly meant “more GPUs, preferably yesterday.” That was not wrong, exactly, but it was incomplete. A real agent does not just ask a model for a paragraph and exit. It reads files, calls tools, queries databases, runs code, routes secrets, waits on APIs, writes logs, and sometimes tries to do all of that behind a firewall where the legal department can still breathe normally. Dell and NVIDIA are now packaging that mess as an architecture: Vera CPUs, Vera Rubin systems, Dell AI Factory, OpenShell, NeMoClaw, Nemotron, Confidential Computing, and AI-Q as one enterprise agent stack.

That is more interesting than another “demand is parabolic” headline. Michael Dell said worldwide AI infrastructure spending could hit $3-4 trillion by 2030, with token consumption rising 3,400% over the same period. Jensen Huang added the expected keynote acceleration quote: “What took months now takes weeks. What took weeks now takes days. And what takes days now takes hours.” Fine. But the practitioner story is less about a bigger market and more about a changed workload. Agents turn AI from a stateless API call into a distributed system with permissions.

The rack is only the visible part of the runtime

Dell’s new PowerEdge XE9812 is built on NVIDIA Vera Rubin NVL72, with NVIDIA claiming up to 10x lower cost per token than Blackwell for massive-scale agentic inference. Dell is also lining up Rubin systems including PowerEdge XE9880L, XE9885L, and XE9882L on NVIDIA HGX Rubin NVL8, supporting up to 144 GPUs per rack, fully direct liquid-cooled compute nodes, and up to 5.5x HGX B200 performance.

Those numbers will get the procurement slides. The CPU numbers deserve more attention. Dell PowerEdge M9822 and R9822 systems bring NVIDIA Vera CPUs into the AI Factory, with NVIDIA citing 1.2 TB/s memory bandwidth and agentic workloads completing 50% faster than x86 processors. Huang put the point bluntly: “Vera CPU has the highest single-threaded performance of any CPU in the world. It has three times the memory bandwidth — as a result, Starburst, DuckDB, all these databases run incredibly fast, because the agents are pounding on the databases, so the CPU had better be super fast.”

That line is the whole story. If agents are pounding on databases, the GPU rack is no longer the system. It is one hot path inside a larger runtime. Tool calls, SQL analytics, code execution, sandbox supervision, retrieval, and policy checks all become part of the latency budget. A faster decoder does not fix an agent loop that spends half its time waiting on glue code and data movement.

This is where many teams still fool themselves. They benchmark tokens per second, then ship an agent that feels slow because every step outside the model is cold, serialized, and under-instrumented. The model responds quickly; the task completes late. Users do not care which subsystem owned the delay. They just see the spinner.

On-prem agents are a security architecture, not nostalgia for data centers

Dell’s survey claims 67% of AI workloads now run outside the cloud and 88% of respondents run at least one AI workload on premises. Treat the exact percentages with the usual vendor-survey caution, but the direction is real. Enterprises want frontier models and open models close to internal data, codebases, operational systems, and compliance boundaries. That does not mean “cloud is over.” It means cloud-only agent architectures are a poor default for regulated or data-heavy work.

The announced stack leans into that. Google Distributed Cloud with Gemini 3.0 is previewing on Dell PowerEdge XE9780 servers with NVIDIA Blackwell and NVIDIA Confidential Computing. SpaceXAI models are slated for on-prem Dell AI Factory deployment with confidential computing. NVIDIA says Nemotron, Reflection models, MiniMax-M2.7, DeepSeek Pro, DeepSeek-V4, GLM 5.1, Kimi K2.6 with NVIDIA NVFP4 optimization, Gemma 4, Mistral Small 4, and Arcee Trinity-Large-Thinking are part of the open/proprietary model mix around the Dell Enterprise Hub on Hugging Face.

The model catalog matters less than the deployment constraint: enterprises want model choice without handing every internal artifact to a remote SaaS endpoint. That is especially true for coding agents. A useful coding agent needs source, docs, issue history, build logs, credentials, and often the ability to run commands. That is also a perfect recipe for data leakage if “agent governance” is just a policy PDF stapled to a prompt.

OpenAI Codex connecting with the Dell AI Data Platform is a clean example. The pitch is obvious: bring Codex closer to the internal context that makes it useful. The risk is also obvious: once a coding agent can see more, run more, and remember more, access control stops being a checkbox. You need runtime boundaries that operate below the model’s persuasive little text box.

OpenShell is the part builders should actually inspect

The most practical piece of the announcement may be NVIDIA OpenShell, not because it is finished, but because it names the right layer. The GitHub README describes OpenShell as a “safe, private runtime for autonomous AI agents” with sandboxed execution, declarative YAML policies, file/network/process controls, credential injection, policy-enforced egress routing, and support for Docker, Podman, MicroVM, and Kubernetes drivers. It is Apache 2.0 alpha software, and the README says the quiet part clearly: “proof-of-life,” “single-player mode,” “expect rough edges.” Good. Honest alpha beats fake enterprise maturity.

The interesting design detail is method- and path-level network enforcement. In the README example, a sandbox starts with minimal outbound access; a read-only GitHub API policy allows GET requests but blocks POST requests to GitHub issues. That is exactly the kind of boring control agent systems need. The model should not be trusted to remember that it may read from GitHub but not write to GitHub. The runtime should enforce it, log it, and make the denial boring.

OpenShell also exposes the uncomfortable truth about local and on-prem agents: locality increases both usefulness and blast radius. A deskside Dell Pro Max with GB10 or GB300 Grace Blackwell, NeMoClaw, OpenShell, and Nemotron could be a powerful local-agent appliance for a team. It can also become a very efficient way to let an agent touch repos, credentials, tickets, internal APIs, and local files unless policies are explicit. The workstation is not magically safer because it is physically nearby. Sometimes it is more dangerous because everyone treats it like a dev box.

That makes the Dell Deskside Agentic AI angle worth watching. It is not just “AI PC, but heavier.” It is a bet that agent development and deployment will happen across a continuum: RTX workstations for developers and small teams, deskside Grace Blackwell systems for heavier local inference, and Vera Rubin racks for frontier-scale workloads. The architectural question is whether the same policy, audit, credential, and data controls can follow the agent across that continuum. If they cannot, the stack becomes three islands and a compliance incident.

What practitioners should do before buying the factory

Most engineering teams should not read this announcement and immediately design around Vera Rubin. The useful move is to steal the shape of the architecture and apply it to current systems.

First, trace the agent loop end to end. Measure model prefill and decode, tool execution, database queries, sandbox startup, dependency installs, filesystem search, retrieval, network calls, context compaction, and final synthesis. If you cannot say where time goes, you do not have an AI infrastructure strategy. You have a GPU invoice and hope.

Second, define permissions outside the prompt. Agents need scoped filesystem access, allowlisted network destinations, method-level API controls, credential injection that does not spray secrets into logs, and durable audit trails for tool calls. If your governance model is “the system prompt says don’t exfiltrate data,” request changes.

Third, measure cost per successful task, not cost per token. Token economics matter, but agent workloads fail in more interesting ways: repeated tool loops, cache misses, flaky tests, blocked egress, bad retrieval, and human review time. A cheap token stream that produces unusable work is not cheap. A more expensive stack that reliably completes the task with auditable steps may be the better system.

Fourth, resist vendor-shaped abstraction boundaries. NVIDIA wants the AI Factory to run through NVIDIA hardware, NVIDIA runtimes, NVIDIA model tooling, and NVIDIA security layers. Some of that may be excellent. Some of it may be too early. The durable lesson is not “buy every SKU.” It is that agent infrastructure needs phase-aware compute, on-prem deployment options, confidential model/data handling, explicit runtime policy, and observability that treats agents like long-running distributed processes.

The best reading of Dell and NVIDIA’s announcement is that enterprise AI is graduating from model demos into operating systems for agents. That is the right direction. It is also where the hard work starts. A GPU rack can make inference faster; it cannot decide who an agent is allowed to call, which database it may query, whether a credential should exist inside a sandbox, or why a task failed after 37 tool calls. The companies that answer those questions before the purchase order will get an AI factory. The ones that do not will get a very expensive haunted house.

Sources: NVIDIA Blog, NVIDIA Vera CPU, NVIDIA Vera Rubin NVL72, NVIDIA/OpenShell GitHub, NVIDIA AI-Q Blueprint