vibe-coding

A collection of 86 posts
ProdCodeBench: The First Benchmark Built from Real Production Coding Sessions — and What It Reveals About Agents in Monorepos
vibe-coding

ProdCodeBench: The First Benchmark Built from Real Production Coding Sessions — and What It Reveals About Agents in Monorepos

Most coding agent benchmarks miss the mark when it comes to real-world usage. They use different programming language distributions, simplified prompt styles, and isolated toy codebases instead of the complex monorepos that teams actually work with. ProdCodeBench changes the game by being built from real production sessions—curated from verbatim
1 min read
Why Your Multi-Agent Review Pipeline's Gains Aren't What You Think — Decomposing What Actually Happens in the Second Pass
vibe-coding

Why Your Multi-Agent Review Pipeline's Gains Aren't What You Think — Decomposing What Actually Happens in the Second Pass

Multi-LLM revision pipelines — where a second model reviews and improves output from a first — are a standard pattern in agentic coding systems. The assumption baked into almost every implementation is that the second pass corrects errors: the reviewer catches what the generator missed. New research runs a controlled decomposition experiment
1 min read
Meta's Prompt Template That Lets Agents Review Code at 93% Accuracy — Without Running It
vibe-coding

Meta's Prompt Template That Lets Agents Review Code at 93% Accuracy — Without Running It

Meta researchers have developed a structured prompting technique — called semi-formal reasoning — that enables LLMs to verify code patches without executing them, reaching up to 93% accuracy on real-world agent-generated patches. The technique fills a practical gap between free-form chain-of-thought reasoning (flexible but prone to hallucinations) and rigid formal verification (precise
1 min read