vibe-coding

A collection of 86 posts
MobileDev-Bench: The First Production-Grade Coding Agent Benchmark on Mobile Apps — All Top Models Fail on Real Android, React Native, and Flutter Issues
vibe-coding

MobileDev-Bench: The First Production-Grade Coding Agent Benchmark on Mobile Apps — All Top Models Fail on Real Android, React Native, and Flutter Issues

Mobile development has always been a harder target for coding agents than web or general-purpose library work — and now there's benchmark data to prove it. MobileDev-Bench is the first production-grade evaluation covering all three major mobile stacks: Android Native (Java/Kotlin), React Native (TypeScript), and Flutter (Dart). Its
1 min read
The Kitchen Loop: A Production-Tested Framework Where an LLM Agent Stress-Tests Your Product at 1,000× Human Speed — 1,094 PRs, Zero Regressions
vibe-coding

The Kitchen Loop: A Production-Tested Framework Where an LLM Agent Stress-Tests Your Product at 1,000× Human Speed — 1,094 PRs, Zero Regressions

A new paper from arXiv gives engineering teams a rare thing: a production-validated blueprint for self-improving software that has actually shipped. The Kitchen Loop is a four-primitive framework tested across two live systems over 285 iterations, producing more than 1,094 merged pull requests with zero regressions detected by the
1 min read