nvidia - The LGTM (Page 2)

The LGTM

Sign in Subscribe

nvidia

A collection of 139 posts

Agentic RAG Does Not Need Another Framework. It Needs Fewer PCIe Road Trips

Agentic RAG Does Not Need Another Framework. It Needs Fewer PCIe Road Trips

Agentic RAG has spent the last year collecting abstractions. Planners, routers, memory layers, tool registries, tracing dashboards, evaluation harnesses — useful pieces, mostly. But Anubhab Banerjee’s CUDA Top-K retrieval experiment points at a less glamorous problem that many teams still under-measure: the retrieval step keeps crossing the PCIe border like

Cannes Lions Shows NVIDIA Turning Marketing AI Into a Systems Problem

Cannes Lions Shows NVIDIA Turning Marketing AI Into a Systems Problem

Cannes Lions is not where engineers usually go looking for serious systems architecture. That is fair. The conference packaging is glossy enough to make any infrastructure person reach for the mute button. But NVIDIA’s advertising and marketing AI roundup is more interesting than the setting suggests because adtech has

NVIDIA’s FERC Post Is a Reminder That AI Factories Now Have a Power-Queue Roadmap

NVIDIA’s FERC Post Is a Reminder That AI Factories Now Have a Power-Queue Roadmap

The newest bottleneck in AI infrastructure is not a GPU, a compiler flag or a clever quantization recipe. It is a queue. Specifically: the interconnection queue between giant new AI loads and an electrical grid that was not designed to treat gigawatt-class compute campuses as a normal Tuesday. NVIDIA’s

Coherent’s Texas Expansion Is the Unsexy Part of AI Scaling That Actually Decides the Rack

Coherent’s Texas Expansion Is the Unsexy Part of AI Scaling That Actually Decides the Rack

AI infrastructure usually gets sold as a GPU story because GPUs are the expensive thing everyone can point at. Coherent’s Sherman, Texas expansion is a useful correction: at rack scale, the expensive thing is increasingly the stuff that lets the GPUs behave like one machine instead of thousands of

ENPIRE Shows Robot Training Becoming an Agent-Orchestrated Research Loop

ENPIRE looks, at first glance, like another impressive robotics demo: robot arms pushing objects, inserting pins, cutting zip ties, and seating a GPU into a motherboard. That undersells it. The important part is not that a robot learned a task. The important part is that NVIDIA GEAR, Carnegie Mellon, and

The Real Numbers Behind Low-Precision Training: A Practical GEMM-Level Guide for Transformer Engineers

The Real Numbers Behind Low-Precision Training: A Practical GEMM-Level Guide for Transformer Engineers

If you have been building AI infrastructure long enough, you have learned the hard way that benchmark numbers are not product numbers. The spec sheet says FP8 delivers 2x throughput over BF16. The real training run delivers something different, and the gap is usually not in NVIDIA's favor.

NVIDIA's Transaction Foundation Model Blueprint Shows Where GPU-Accelerated Fraud Detection Actually Wins

NVIDIA's Transaction Foundation Model Blueprint Shows Where GPU-Accelerated Fraud Detection Actually Wins

There's a quiet revolution happening in fraud detection, and it's not about a new model architecture everyone is talking about. It's about what happens when you stop treating transaction history as a table of independent events and start treating it as a sequence that