The LGTM (Page 66)

The LGTM

Sign in Subscribe

Grok Build’s Memory and Command Surface Turns the CLI Into a Stateful Agent Runtime

Persistent memory is where coding agents stop feeling like clever terminals and start behaving like stateful runtimes. That is useful. It is also where old assumptions, stale decisions, local hacks, and occasionally secrets can acquire a longer half-life than anyone intended. xAI’s refreshed Grok Build documentation is interesting not

xAI’s Management API Gives Grok the Admin Surface Teams Actually Need

xAI’s Management API Gives Grok the Admin Surface Teams Actually Need

xAI is doing the least glamorous work in the AI platform stack, which is exactly why this update matters. The company’s refreshed Management API documentation is not a model launch, not a benchmark chart, and not another demo of Grok writing a React component. It is the admin surface

Goose 1.36 Turns Code Review, Hooks, Skills, and Goals Into the Agent Runtime

Goose 1.36.0 is not a “new buttons in the chat window” release. It is more interesting than that, which is inconvenient for anyone trying to reduce coding-agent progress to model benchmarks. The release turns several things teams have been improvising around — code review, permission hooks, reusable skills, goal

SkillGrad Treats Agent Skills Like Code That Needs an Optimizer, Not a Pep Talk

Agent skills are being treated too much like helpful markdown and not enough like dependencies. That is the mistake SkillGrad is trying to correct. A skill can change how an agent edits a spreadsheet, reads a table, calls a tool, writes code, or follows a procedure. If that artifact can

MemTrace Turns Agent Memory Bugs Into Something You Can Actually Debug

MemTrace Turns Agent Memory Bugs Into Something You Can Actually Debug

Memory is where agent demos go to become production incidents. In a demo, the assistant remembers the user’s preference and everyone nods. In production, it stores the wrong preference, retrieves the stale one, overwrites the useful one, cites irrelevant history, and then produces an answer that looks confident enough

LearnWeak Makes Small Computer-Use Agents Better by Training on Their Actual Mistakes

LearnWeak Makes Small Computer-Use Agents Better by Training on Their Actual Mistakes

LearnWeak is a useful reminder that small agents do not need motivational posters. They need failure-specific training. The framework takes computer-use agents that are weak in particular desktop domains, finds where a stronger teacher succeeds and the smaller student fails, generates new tasks around those weaknesses, and trains the student

AXPO Shows Why Tool-Using Models Need Different RL Than Chat Models

AXPO Shows Why Tool-Using Models Need Different RL Than Chat Models

The interesting thing about AXPO is not that it makes Qwen3-VL-Thinking score a little better on multimodal benchmarks. The interesting thing is the failure mode it catches: models that know a tool exists, talk about using it, and then retreat back into pure text because acting has become too expensive

Codex 0.135 Alpha Is Turning Agent Memory, Search, and Goal Accounting Into Runtime Primitives

Codex 0.135 Alpha Is Turning Agent Memory, Search, and Goal Accounting Into Runtime Primitives

Codex 0.135.0-alpha.2 is not the kind of release that gets a product-launch video. Good. The interesting part of coding agents right now is not whether they can write another demo todo app; it is whether their runtime has enough explicit state, accounting, and diagnostics that a serious