The LGTM (Page 32)

The LGTM

Sign in Subscribe

On-Policy Distillation Looks Dense on Paper, but the Parameter Updates Are Sparse Where It Counts

On-Policy Distillation Looks Dense on Paper, but the Parameter Updates Are Sparse Where It Counts

On-policy distillation sounds like it should rewrite models broadly. The student generates its own trajectories, the teacher provides dense token-level feedback, and the optimizer gets a lot more signal than sparse reward RL. The surprise in this paper is that the final parameter updates still look selective. “Dense Supervision, Sparse

EvoArena Shows Why Agent Memory Needs Git-Style History, Not Just a Bigger Scratchpad

EvoArena Shows Why Agent Memory Needs Git-Style History, Not Just a Bigger Scratchpad

Agent memory keeps getting described as a bigger notebook. EvoArena makes the better argument: it should look more like git history. The benchmark suite evaluates agents in environments that change over time: terminal workflows, software repositories, and user preferences. Its paired method, EvoMem, wraps ordinary memory systems with append-only patch

RA-RFT Says Retrieval for Reasoning Should Find Analogies, Not Keyword Neighbors

RA-RFT Says Retrieval for Reasoning Should Find Analogies, Not Keyword Neighbors

Retrieval-augmented generation has a bad habit: it assumes the useful thing is the thing with the most similar words. RA-RFT is a reminder that reasoning does not work that way. The paper, from researchers affiliated with Meta Superintelligence Labs and Rice University, proposes Retrieval-Augmented Reinforcement Fine-Tuning: a post-training method that

OpenClaw’s Default-Model Doctor PR Is the Gemini Migration Story in Miniature

The ugliest AI-tooling failure is not “the model is unavailable.” It is “the tool says this dead model is your default, then tells you the model does not exist, then fails your agent run anyway.” That is not an error message. That is a trust withdrawal. OpenClaw PR #92292 addresses

OpenClaw’s CoreWeave Provider PR Turns Open-Model Inference Into a First-Class Agent Runtime Choice

Provider support usually looks like plumbing until the first time an agent run dies because the model name was right, the base URL was wrong, the context window metadata was missing, and nobody remembered which custom header the hosted inference service wanted. OpenClaw PR #92243 is interesting because it takes

Apple Putting Private Cloud Compute on NVIDIA GPUs Is the Privacy Story Builders Should Study

Apple Putting Private Cloud Compute on NVIDIA GPUs Is the Privacy Story Builders Should Study

Apple did something unusual this week: it made NVIDIA GPUs part of a privacy story instead of a performance story. That is not the usual role for NVIDIA in AI infrastructure coverage. The default script is simple enough: larger model, faster chip, bigger cluster, lower latency, higher throughput. This announcement

Grok Build Gets a Plugin Marketplace, Which Is xAI's Bid for Agent Distribution

Grok Build Gets a Plugin Marketplace, Which Is xAI's Bid for Agent Distribution

xAI just made Grok Build more interesting for the least glamorous reason in developer tooling: distribution. The company has launched a built-in Plugin Marketplace for Grok Build, its terminal coding agent, with an official catalog hosted on GitHub and six launch integrations: MongoDB, Vercel, Sentry, Chrome DevTools, Cloudflare, and Superpowers.