The  LGTM
  • Home
  • Agentic Coding
  • Claude Code
  • Codex
Sign in Subscribe
ai-models

DeepSWE Is the Coding-Agent Benchmark That Makes Leaderboards Look Less Comfortable

The useful thing about DeepSWE is not that it gives GPT-5.5 another trophy. Trophies are cheap now. The useful thing is that it makes the coding-agent leaderboard conversation less comfortable by attacking the part everyone quietly depends on: whether the benchmark judge knows software engineering when it sees it.
26 May 2026 5 min read
Copilot Model Rules Make AI Governance an Organization-Level Control
codex

Copilot Model Rules Make AI Governance an Organization-Level Control

GitHub’s latest Copilot admin feature is not glamorous, which is exactly why it matters. Targeted model rules let enterprise owners make specific Copilot models available to specific organizations instead of forcing one enterprise-wide policy across everyone. That sounds like checkbox plumbing until you remember what the model picker has
26 May 2026 4 min read
Agentic Fix Turns Pentest Findings Into PRs — Now Comes the Hard Part
claude-code

Agentic Fix Turns Pentest Findings Into PRs — Now Comes the Hard Part

Security vendors have spent years trying to get developers to read vulnerability dashboards. Novee’s Agentic Fix makes a more realistic bet: developers are already living inside GitHub and coding agents, so the security workflow has to meet them there. The interesting part is not that an AI tool can
26 May 2026 5 min read
openclaw

OpenRouter Context Overflow Shows Why Agent Routing Needs Token Budgets

A large context window is not a budget plan. OpenClaw issue #86880 shows what happens when an agent runtime appears to treat a provider’s maximum context length as permission to reserve nearly the entire window for output tokens: the request overflows before the model has a fair chance to
26 May 2026 3 min read
openclaw

OpenClaw’s Compaction Circuit Breaker Is the Right Kind of Cost Control

Retry logic is where optimism goes to become a cloud bill. OpenClaw PR #86900 is small, but it fixes the right class of problem: compaction should stop hammering a summarizer once the runtime has enough evidence that the dependency is down. That is cost control in the execution path, not
26 May 2026 3 min read
openclaw

OpenClaw Webchat Has a Message-Loss Bug Where the Turn Disappears

The scariest agent failure is not a red error banner. It is a turn that looks accepted, runs real tools, hangs after the tool result, and then vanishes from the transcript as if the user never asked. That is the failure mode described in OpenClaw issue #86895, and it is
26 May 2026 3 min read
OpenClaw 2026.5.25-beta.1 Is Boring in the Best Possible Way
openclaw

OpenClaw 2026.5.25-beta.1 Is Boring in the Best Possible Way

OpenClaw v2026.5.25-beta.1 is the kind of release that will not win a demo day and absolutely will decide whether the platform survives contact with real operators. The headline is not a shiny new model. It is Alpine installs that stop tripping over the wrong libc, Windows scripts
26 May 2026 4 min read
TensorRT-LLM’s May 26 RC Shows Multimodal Inference Is Now a Systems Problem
nvidia

TensorRT-LLM’s May 26 RC Shows Multimodal Inference Is Now a Systems Problem

TensorRT-LLM v1.3.0rc16 is the kind of release that will not trend on Hacker News and will absolutely decide whether someone’s production inference stack has a good week. There is no single benchmark chart here, no “10x” headline, no clean product narrative. Instead, NVIDIA shipped a release candidate
26 May 2026 5 min read
Google’s Agent Executor Is Kubernetes Thinking Applied to Agents: Event Logs, Resumption, and Runtime Sovereignty
ai-frameworks

Google’s Agent Executor Is Kubernetes Thinking Applied to Agents: Event Logs, Resumption, and Runtime Sovereignty

Google’s Agent Executor is not interesting because the world needed another agent framework logo. It is interesting because it says the quiet part out loud: long-running agents are distributed systems now. Once an agent can wait for approval, call tools, spawn workers, recover from disconnects, branch from checkpoints, and
26 May 2026 5 min read
Production Agents Do Not Need Better Demos. They Need Fresh Data, Safe Writes, and Receipts.
ai-frameworks

Production Agents Do Not Need Better Demos. They Need Fresh Data, Safe Writes, and Receipts.

A lot of agent postmortems are going to look embarrassingly familiar once teams stop blaming the model. The agent reordered inventory from stale stock data. It closed an incident because one system said resolved while the rollback was still pending somewhere else. It issued a refund against a customer record
26 May 2026 4 min read
Qwen Code’s May 26 Nightly Hardens the Local Agent Runtime
agentic-coding

Qwen Code’s May 26 Nightly Hardens the Local Agent Runtime

Qwen Code’s May 26 nightly is the kind of release that does not win launch-day screenshots and absolutely matters if you run coding agents for real work. The headline is not a new model, a prettier terminal, or a benchmark victory lap. It is runtime hardening: budgets, concurrency, telemetry,
26 May 2026 5 min read
Coding Agents Do Not Need More Speed. They Need a Verification Loop
agentic-coding

Coding Agents Do Not Need More Speed. They Need a Verification Loop

The uncomfortable truth about AI coding agents is that speed stopped being the impressive part. The demos already proved they can generate code faster than a human can review it. The unresolved question is whether a team can merge that code without turning software delivery into a trust fall with
26 May 2026 4 min read
← Newer Posts Page 28 of 111 Older Posts →
The LGTM © 2026
  • Sign up
Powered by Ghost