The  LGTM
  • Home
  • Agentic Coding
  • Claude Code
  • Codex
Sign in Subscribe
LLM Self-Evaluation Looks Less Like Magic and More Like a Capability Waiting to Be Elicited
ai-models

LLM Self-Evaluation Looks Less Like Magic and More Like a Capability Waiting to Be Elicited

Self-evaluation in language models has always had a suspicious glow around it. Ask a model whether its own answer is good and you get something that can look like judgment, confidence, compliance theater, or all three in a trench coat. The useful question is not whether the model is “aware”
04 Jun 2026 4 min read
Copilot Chat’s PR Context Upgrade Is GitHub Admitting Review Needs a Workspace, Not a Chat Drawer
codex

Copilot Chat’s PR Context Upgrade Is GitHub Admitting Review Needs a Workspace, Not a Chat Drawer

GitHub’s latest Copilot Chat update looks small if you read it as a UI change. A button at the top of a diff. Chat beside code. Faster answers. Fine. Read it as a review-system change and it gets more interesting. GitHub is making richer pull-request and diff context generally
04 Jun 2026 6 min read
Anthropic Says Claude Now Writes More Than 80% of Its Code — The Real Story Is the Review Bottleneck
claude-code

Anthropic Says Claude Now Writes More Than 80% of Its Code — The Real Story Is the Review Bottleneck

If Claude now writes more than 80% of Anthropic’s merged code, the interesting question is not whether AI can generate code. That question has been downgraded from debate topic to operational fact. The useful question is what breaks when code becomes cheap and review becomes scarce. Anthropic’s Institute
04 Jun 2026 5 min read
Claude Code 2.1.163 Turns Version Policy, Plugins, Hooks, and MCP Session Continuity Into Runtime Governance
claude-code

Claude Code 2.1.163 Turns Version Policy, Plugins, Hooks, and MCP Session Continuity Into Runtime Governance

Claude Code v2.1.163 is not trying to win a demo. It is trying to answer the question every platform team eventually asks about coding agents: who is allowed to run what version, with which plugins, under which hooks, against which external tools, and how do we prove it
04 Jun 2026 5 min read
Qwen3.7-Plus Makes Alibaba’s Agent Bet Multimodal — and Less Open
qwen

Qwen3.7-Plus Makes Alibaba’s Agent Bet Multimodal — and Less Open

Qwen3.7-Plus is not Alibaba rediscovering multimodal AI. It is Alibaba drawing a much cleaner product boundary: open-weight Qwen for builders who want control, proprietary Qwen for teams that want hosted agent capacity at an aggressively low price. That split is the story. The model’s image and video inputs
04 Jun 2026 6 min read
OpenClaw's Slack allowBotsFrom Is a Small Config Key With a Big Multi-Agent Governance Point
openclaw

OpenClaw's Slack allowBotsFrom Is a Small Config Key With a Big Multi-Agent Governance Point

Multi-agent collaboration has a funny way of starting as a demo and ending as an access-control problem. Put four agents in Slack, give them names, let them post into the same channels, and the surface looks like teamwork. Underneath, the transport is deciding which messages exist, which bots are allowed
04 Jun 2026 4 min read
OpenClaw's macOS Memory Fix Restores the Part of Agent Memory That Actually Makes Retrieval Useful
openclaw

OpenClaw's macOS Memory Fix Restores the Part of Agent Memory That Actually Makes Retrieval Useful

Agent memory has a product problem disguised as an infrastructure problem: users judge the assistant by what it recalls, but the quality of recall depends on native database flags most users will never see. PR #90323 is one of those fixes that looks like dependency plumbing until you follow the
04 Jun 2026 4 min read
OpenClaw's xAI/Venice Tool-Call Fix Is a Byte-Level Reminder That Agent Safety Starts Before the Tool Runs
openclaw

OpenClaw's xAI/Venice Tool-Call Fix Is a Byte-Level Reminder That Agent Safety Starts Before the Tool Runs

Tool-call security usually gets discussed at the dramatic boundary: the moment an agent wants to run a command, edit a file, send a message, or touch production state. That is the visible checkpoint, so it gets the approval dialog, the audit event, and the argument about whether humans should stay
04 Jun 2026 4 min read
OpenClaw's Trajectory-Capture Hardening Treats Tool Schemas Like Untrusted Supply Chain
openclaw

OpenClaw's Trajectory-Capture Hardening Treats Tool Schemas Like Untrusted Supply Chain

Trajectory capture sounds like an internal logging feature until the day an agent run fails and the only useful question is, "What did the runtime think was true when it made that decision?" At that point it becomes the black box. PR #90268, paired with the Codex app-server
04 Jun 2026 4 min read
MoE Inference Is Becoming a Rack-Scale Systems Problem, Not Architecture Trivia
nvidia

MoE Inference Is Becoming a Rack-Scale Systems Problem, Not Architecture Trivia

Mixture-of-experts models used to be a model-architecture detail. Now they are an infrastructure procurement strategy. NVIDIA’s latest Blackwell NVL72 pitch is nominally about MoE models running “10x faster” at “one-tenth the token cost.” Fine. Vendor math belongs in the same drawer as benchmark charts until proven otherwise. But the
04 Jun 2026 5 min read
Microsoft Security Is Starting to Treat Local Coding Agents, MCP Servers, and Models as First-Class Attack Surface
azure-ai

Microsoft Security Is Starting to Treat Local Coding Agents, MCP Servers, and Models as First-Class Attack Surface

Microsoft’s Build security announcement is not important because of one product name. It is important because the company is finally treating local coding agents, MCP servers, model artifacts, prompts, and agent runtimes as first-class attack surface. That sounds obvious only after the industry has spent two years handing repo
04 Jun 2026 4 min read
Toolboxes and Routines Are Microsoft’s Answer to Agent Tool Sprawl Before It Becomes the New Microservices Mess
azure-ai

Toolboxes and Routines Are Microsoft’s Answer to Agent Tool Sprawl Before It Becomes the New Microservices Mess

Microsoft’s Foundry Toolboxes update looks like a grab bag if you read it as a feature list: Skills, Work IQ, Fabric IQ, Browser Automation, managed MCP servers, Tool Search, guardrails, and Routines. Read it as an operations story and it becomes much cleaner. Microsoft is trying to stop agent
04 Jun 2026 4 min read
← Newer Posts Page 2 of 108 Older Posts →
The LGTM © 2026
  • Sign up
Powered by Ghost