vibe-coding - The LGTM

The LGTM

Sign in Subscribe

vibe-coding

A collection of 86 posts

Passing the Verifier Is Not Correctness: Agentic Spec Synthesis Has a False Confidence Problem

Passing the Verifier Is Not Correctness: Agentic Spec Synthesis Has a False Confidence Problem

Automated formal specification synthesis has been advancing fast, with recent work reporting high verifier pass rates for LLM-generated JML specs. A new paper asks the uncomfortable follow-up question: does passing the verifier actually mean the specification is correct and complete? The answer, backed by a new evaluation framework called Spec-Harness,

Why Your Multi-Agent Review Pipeline's Gains Aren't What You Think — Decomposing What Actually Happens in the Second Pass

Why Your Multi-Agent Review Pipeline's Gains Aren't What You Think — Decomposing What Actually Happens in the Second Pass

Multi-LLM revision pipelines — where a second model reviews and improves output from a first — are a standard pattern in agentic coding systems. The assumption baked into almost every implementation is that the second pass corrects errors: the reviewer catches what the generator missed. New research runs a controlled decomposition experiment

Your AI Coding Agent Writes Like a Beginner 90% of the Time — What That Actually Means for Code Review

Your AI Coding Agent Writes Like a Beginner 90% of the Time — What That Actually Means for Code Review

As AI coding agents take on more of the actual code-writing work, human developers increasingly shift into the reviewer role. That shift raises a practical question that hasn't had a data-grounded answer: what skill level does a developer actually need to review AI-generated code? New research using the

Your Agent's Tool Library Is a Software Artifact — and It's Rotting While You Watch Task Completion Scores

Your Agent's Tool Library Is a Software Artifact — and It's Rotting While You Watch Task Completion Scores

Most agentic coding pipelines measure success by one metric: did the task complete? A new benchmark called EvolveTool-Bench exposes what that metric hides. When agents are allowed to create their own tools at runtime — writing helper functions, API wrappers, and data processors on the fly — those tools accumulate into a