agentic-coding

The 5 Myths of the Agentic Coding Apocalypse

Anatoliy Kolodkin

04 May 2026 • 6 min read

There's a specific kind of pain that hits when you've spent three days shipping features with an AI coding tool, then open the file browser and realize you have no idea where anything actually lives.

David Gewirtz has been writing about this for months — first-person dispatches from the frontier of vibe coding, including the uncomfortable admission that after a few days of Claude Code building his iPhone app, the file structure was, in his words, "completely incoherent." Files placed arbitrarily. Named arbitrarily. Nothing grouped. The model produced exactly what he asked for, which turned out to not be what he needed.

His latest piece at ZDNET — "The 5 Myths of the Agentic Coding Apocalypse" — is the distillation of that experience into a framework. Five myths, five real failure modes, and five practical disciplines for avoiding them. It's not anti-AI coding. It's not hype. It's what you get when someone actually does the experiments and publishes the results, including the parts that didn't work.

The Contractor, Not the Employee

The organizing metaphor Gewirtz uses is the contractor. Engineering managers have always managed outside contributors — agencies, freelancers, offshore teams. The discipline is the same for AI: checkpoints, integration testing, clear task boundaries. You don't hand a contractor a 40-page requirements doc and hope for the best. You break the work into discrete deliverables, review each one, and only proceed when what you received matches what you actually needed.

This is sound engineering advice that happens to apply with special force to AI workflows, for a specific reason: a human contractor will ask questions when the spec is ambiguous. An AI will make an assumption and proceed. That assumption might be correct. It might not. You won't know until you review the output — and if you've handed the model a large, complex task and walked away, you might be reviewing code that's structurally wrong in ways that require a full rewrite to fix.

Gewirtz's specific tactic: give the AI one simple task at a time, not a deep requirements document. This isn't a limitation of current models — it's sound engineering discipline. Deep, rich requirements documents are great for human engineers who can ask clarifying questions and course-correct mid-stream. An AI that misinterprets one element of a complex spec can go off the rails in ways that are hard to trace back to the original misunderstanding. Small, verified deliverables are the right unit of work regardless of who's writing the code.

What the Five Myths Actually Mean

The first myth — Lost Control — is the one that generates the most industry hand-wringing and the least useful guidance. Gewirtz's response is practical: you never had total control. Engineers have always worked with contractors, offshore teams, and library dependencies maintained by strangers. The discipline is the same. Checkpoints. Verification. Clear scope. The role of the engineering manager doesn't change; the contractor changes.

The second — Real-World Readiness — is the "curse of knowledge" problem. Developers know what the code should do; real users don't. Automated tests inherit developer blind spots. AI-written unit tests often inherit the same blind spots as human-written ones because the training data includes those blind spots. The fix: adversarial testing, edge-case prompting, explicit failure-mode requirements. You have to test what the code does, not just that it does what you asked it to do.

The third — Inherited Code — is the one that rarely shows up in AI coding tool marketing. Every AI coding experience is, by definition, an IP acquisition. The code is not written by developers who understand the full context — it's generated from patterns in training data applied to your requirements. The maintenance challenge is identical to acquiring a software product from a third party: you have to reverse-engineer the architecture, absorb the undocumented decisions, and build a mental model from scratch. This is not a small undertaking, and it's not one that most vibe coding guides prepare you for.

The fourth — Maintenance Debt — is where the file structure problem lives. AI-generated code often lacks consistent intent, structure, and architectural coherence. Naming conventions vary. Patterns across components are inconsistent. Changes cascade into unexpected bugs because the original generation didn't have a coherent design vision — it had a prompt and a context window. Gewirtz's specific example — the incoherent file structure after a few days with Claude Code — is the concrete manifestation of this. The fix requires iteration: explicit instructions to clean up, testing to find what pattern actually works, then immortalizing that pattern in startup instructions so the next session starts from a known state rather than repeating the drift.

The fifth — Vulnerability-Free Output — should be alarming to anyone shipping AI-generated code to production. The training data includes all of GitHub, including its bugs and vulnerabilities. Models reproduce insecure patterns. Gewirtz's specific example: the AI working on his security product did "absolutely zero input verification" until explicitly instructed to check inputs. AIs also incorporate libraries without checking supply chain for downstream vulnerabilities. Input validation and sanitization gaps — the bread and butter of web application security — are not defaults in AI-generated code. They're explicit requirements. Teams that treat AI coding tools as "set and forget" are shipping that assumption to production.

The Dual-AI Review Pattern Is Worth Taking Seriously

One of the more practically useful ideas in Gewirtz's piece is the dual-AI review pattern: use Claude Code and OpenAI Codex to check each other's work. One model codes, the other reviews. "With careful coordination on my part, they keep each other fairly honest," he writes.

This is not a perfect solution, but it's computationally cheap and genuinely effective in ways that are worth understanding. Different models have different blind spots — not just different capabilities, but different failure modes. A model from a different provider is more likely to catch those blind spots than the same model reviewing its own output, because the training data distribution and the reasoning patterns that emerge from it differ across providers. Running a Codex review pass on Claude Code output (or vice versa) is a two-command workflow. The output is not perfect, but "both models missed the same issue" is rarer than either model missing it alone.

The practical implication: if you're using AI coding tools in production, your review process should include a second model from a different provider. Not as a replacement for human code review — as a complement to it. The second model catches things the first model missed. Human review catches things both models missed. The stack is: AI generate, AI review, human review. Each layer adds coverage the others don't have.

What This Connects To

The maintenance debt myth is where Gewirtz's piece connects most directly to the broader agentic engineering conversation that's been building since Andrej Karpathy's Sequoia AI Ascent talk. Karpathy's framing — vibe coding raises the floor for non-developers; agentic engineering preserves the quality ceiling for professional software — is the intellectual context for understanding why the file structure problem happens and how to prevent it.

The agentic engineering answer to incoherent file structures is the spec layer: a detailed, living document that tells the agent what to build, how to organize it, and what conventions to follow. Not a requirements doc — a spec that evolves with the codebase, capturing decisions made across sessions so that each new AI interaction starts from accumulated context rather than a blank slate. Gewirtz's fix — explicit instructions, iteration, immortalization in startup instructions — is a manual version of what the spec layer does automatically.

The vulnerability myth connects to a larger concern that's been surfacing in the practitioner press: AI coding tools don't produce insecure code because the models are malicious. They produce insecure code because the training data includes insecure code, and security best practices aren't defaults — they're explicit requirements. Teams that add explicit security requirements to every prompt and every spec are the ones shipping AI-generated code to production without incidents. Teams that don't are shipping their assumptions.

The contractor framework Gewirtz articulates is the right mental model for a reason that goes beyond convenience: it correctly identifies the allocation of responsibility. The AI produces what you asked for. You are responsible for what you needed. That gap — between what you asked for and what you needed — is where the engineering discipline lives, and it's not a gap that gets smaller by ignoring it.

The Thing to Do With This

If you're using AI coding tools without a spec discipline, start one. It doesn't need to be elaborate — a single file that captures: what this project is, how it's organized, what conventions it follows, what the agent should and shouldn't do. Update it when the agent does something unexpected and it's worth preserving. Over time, it becomes the institutional memory that lets each new session start from a known state rather than inventing the architecture fresh each time.

If you're shipping AI-generated code without a second-model review pass, add one. Two commands. Different provider. The coverage gain is real and the cost is low.

If you're not explicitly requiring security properties in your prompts — input validation, output sanitization, dependency checking — add those requirements. Not as generic security guidance, but as specific, testable requirements. "Validate all user inputs before processing" is a prompt. "Ensure all user-provided strings are validated against an allowlist of permitted characters and length-limited before database insertion" is an engineering requirement. The specificity is the difference between code that might be secure and code you can verify is secure.

The AI coding tools are not going away. The question is whether you're using them as a sophisticated autocomplete or as a force multiplier for actual engineering. The difference is the discipline you bring to the interaction. Gewirtz's five myths are a map of where that discipline needs to go.

Sources: ZDNET — The 5 Myths of the Agentic Coding Apocalypse, ZDNET — I got 4 years of product development done in 4 days for $200, Port.io — The Hidden Technical Debt of Agentic Engineering

The Contractor, Not the Employee

What the Five Myths Actually Mean

The Dual-AI Review Pattern Is Worth Taking Seriously

What This Connects To

The Thing to Do With This

Sign up for more like this.