The 5 Myths of the Agentic Coding Apocalypse

David Gewirtz has been vibe coding for months, publishing first-person accounts of shipping real products with AI tools, including the striking result of getting four years of development work done in four days for two hundred dollars. He's also been public about what went wrong. That combination — genuine success and honest failure reporting — is why his latest piece, "The 5 Myths of the Agentic Coding Apocalypse," is worth reading carefully instead of dismissing as fear-mongering.

The core framing is a contractor analogy that deserves to become standard vocabulary: treating AI coding agents as contractors, not employees. Employees get autonomy; contractors get scopes of work. The distinction sounds obvious, but the industry spent two years selling "just describe what you want and the AI will build it" — which is employees, not contractors. The mismatch between that sales pitch and the actual failure modes of AI coding tools is exactly what Gewirtz's five myths expose.

The contractor test nobody runs

The first myth — lost control — isn't actually about losing control in the way the term usually implies. Gewirtz's argument is more nuanced: the problem isn't that the AI does things you didn't authorize. It's that the AI does exactly what you asked, which may not be what you needed. A contractor who follows instructions precisely but misinterprets the brief is a management failure, not a contractor failure. The fix isn't tighter technical constraints — it's tighter scope definition. One simple task at a time, not a deep requirements document that can be misinterpreted in any one element and cascade into a deliverable that's technically correct and completely wrong.

This is sound engineering discipline that happens to apply with special force to AI workflows. Human engineers ask clarifying questions and course-correct. An AI that misinterprets one element of a complex spec goes off the rails in ways that are hard to trace back to the original misunderstanding. Small, verified deliverables are the right unit of work regardless of who's writing the code, but the AI makes the verification step non-negotiable rather than optional.

The real-world readiness problem is a blind spot squared

Myth two — real-world readiness — is the one that should keep engineering managers up at night. Automated tests inherit developer blind spots. AI-written unit tests inherit the same blind spots as human-written ones, which means you can't use AI-generated tests to catch AI-generated blind spots. You need adversarial testing, edge-case prompting, and explicit failure-mode requirements built into the spec. The models learned from all of GitHub, including the bugs and the insecure patterns. They reproduce those patterns unless you specifically say otherwise.

The example Gewirtz cites is telling: an AI working on a security product did "absolutely zero input verification" until explicitly instructed to check inputs. This wasn't a capability failure — the model knew how to validate input. It just didn't, because it hadn't been told to. The training data distribution favored the code that exists (often buggy) over the code that should exist (secure, validated, correct). Input validation and sanitization gaps allow exploit vectors that a human security engineer would have caught as a matter of course. AIs incorporate libraries without checking supply chains for downstream vulnerabilities. These aren't hypothetical risks — they're documented outcomes from real development sessions.

Maintenance debt compounds invisibly

The maintenance-debt myth is where the piece connects most directly to what the broader "vibe coding backlash" has been documenting: the gap between "it works when the AI builds it" and "it works when you're maintaining it six months later." After a few days with Claude Code on an iPhone app, Gewirtz describes a file structure that was "completely incoherent" — files placed arbitrarily, named arbitrarily, nothing grouped. The AI optimized for completing the immediate task without any concept of what a maintainable codebase looks like.

The fix — explicit instructions to clean up, iteration to find a working pattern, immortalization of that pattern in startup instructions — is essentially what the agentic engineering advocates have been calling "spec-first development" or "context engineering." The vocabulary differs; the practice is the same. The spec and the behavioral configuration are different things. CLAUDE.md tells the agent how to act; a spec tells it what to build. Most teams have only the former and wonder why the output is inconsistent across sessions.

This is the specific insight that makes Gewirtz's piece valuable beyond the myth-busting frame. The maintenance debt isn't an AI problem — it's a context management problem. The AI doesn't know what you consider maintainable unless you tell it, and "tell it once in a prompt" doesn't survive the `/clear` command. The teams that have solved this have built living spec documents that evolve with the codebase, not static prompts that reset with every session.

The dual-AI review pattern

The practical technique Gewirtz proposes for working around individual model blind spots is worth wider adoption: use Claude Code and OpenAI Codex to check each other's work. One model codes, the other reviews. Different providers have different training data distributions and different failure modes. "Both models missed the same issue" is rarer than either model missing it alone. Running a Codex review pass on Claude Code output — or vice versa — is a two-command workflow that adds maybe fifteen minutes to a session and catches a meaningful fraction of the issues that would otherwise ship.

This isn't a silver bullet. It doesn't catch architectural mistakes, design pattern problems, or issues that require deep domain knowledge. But for the class of failures that are specifically AI-shaped — copy-paste artifacts, inconsistent naming conventions, missing error handling on obvious edge cases — a second model from a different provider is a cheap, effective check. The key qualifier is "with careful coordination on my part," as Gewirtz puts it. The human is still the integrator of last resort.

What the contractor model actually requires

The contractor framework isn't just a metaphor — it has specific operational implications that most teams haven't internalized. Contractors get clear scopes of work before they start. They get checkpoints during execution, not just at the end. Their deliverables get reviewed by someone who understands the full picture, not just the piece they built. The AI coding equivalent: a living spec, frequent verification of intermediate outputs, and human code review that goes beyond "does this compile" to "does this fit the architecture."

The myth that gets named but not fully explored in Gewirtz's piece is the one the industry keeps bumping into: the AI is responsible for the code it writes, but the engineering team is responsible for what the code does in production. That accountability gap doesn't disappear because the code was generated by an AI rather than a contractor. It just looks different — harder to trace back to a specific decision, easier to rationalize away because the code "looks fine." The teams that will win on agentic coding are the ones that treat AI-generated code with the same rigorous review process they'd apply to a junior contractor they just met, not the same trust they'd extend to a senior engineer they've worked with for five years.

The five myths aren't reasons to avoid agentic coding tools. They're reasons to approach them with the same engineering discipline you'd apply to any other technical decision: define the requirements, verify the outputs, plan for maintenance, and never ship what you haven't reviewed. The contractor model makes that discipline concrete and actionable. Whether teams actually implement it is a different question — one the next eighteen months of production incidents will answer.

Sources: ZDNET, ZDNET, Port.io