agentic-coding

Vibe Coding Did Not Break Review Culture, It Exposed How Little Review Many Teams Were Doing

Anatoliy Kolodkin

24 Apr 2026 • 5 min read

The most useful line in the current vibe-coding backlash is not that AI-generated software is risky. We already knew that. The useful line is that AI did not break code review culture. It exposed how shallow a lot of code review culture already was.

Kristopher Fraser's new essay lands on exactly that point. His argument is not that vibe coding is harmless, and it is not another breathless defense of letting non-engineers ship mystery code into production. It is narrower and sharper: if maintainers are merging changes they do not really understand, the core failure is the approvals process. AI just made that weakness scale faster.

That diagnosis holds up, and not just because the examples are getting harder to ignore. Tenzai's recent evaluation of five major coding agents, covered by CSO, found 69 vulnerabilities across 15 generated applications. Roughly 45 landed in the low-to-medium range, but several were rated high and around half a dozen critical. The most serious failures were not the old demo-day classics like exploitable SQL injection or cross-site scripting. In fact, the researchers said they did not find a single exploitable SQLi or XSS issue in the sample apps. The real trouble showed up in API authorization and business logic, exactly where safe and unsafe behavior depends on context, policy, and intent rather than memorized secure-coding patterns.

The merge button is the real bottleneck now

That distinction matters because it tells you what kind of review process no longer works. A surface-level scan for familiar anti-patterns was already an imperfect way to review software. In an AI-heavy workflow, it gets even weaker. Generated code often looks tidy enough to pass a quick smell test. It compiles. It runs. It has tests. It may even come with a confident explanation in the pull request body. None of that proves the logic is right.

Fraser's answer is procedural and, importantly, practical. Ask contributors to explain their changes in plain language. Do not accept "this adds a cache" as a sufficient description. Ask what happens on a cold cache, what happens on overflow, what the eviction strategy is, and what breaks when the dependency is unavailable. Force the code author to cash out the abstraction in operational terms. If they cannot do that, they should not get the merge.

That is good advice, but it only works if teams are willing to admit what changed. The old social contract of code review assumed that the author broadly understood what they wrote. Reviewers could sanity-check the diff, run tests, and trust that the author was filling in the semantic gaps. Vibe coding weakens that assumption. You can now have large diffs assembled by someone who can operate the tool well enough to produce software without being able to reason through every branch, permission check, or failure mode. The gap between authorship and understanding has widened. Review culture has not caught up.

This is where the essay is strongest. It refuses the lazy binary that dominates public discourse. On one side, you get the techno-utopian line that AI democratizes software and critics are just gatekeepers. On the other, you get the purist line that any AI-assisted contribution is contaminated by definition. Both are bad operating models. One assumes output quality follows from access. The other assumes tooling choice tells you all you need to know about risk. Neither helps a maintainer decide whether a patch should ship.

Small PRs are now a security control

Fraser also makes a point that deserves more attention in engineering orgs: PR size is no longer just a readability issue. It is a security and reliability issue. A forty-line change one human can actually reason about is far more defensible than a four-hundred-line blob that nobody truly understands but everyone assumes someone else checked. AI systems make it trivial to manufacture sprawling diffs with plausible structure. That means PR size limits are becoming a form of governance, not just etiquette.

There is an important second-order effect here. When teams allow giant AI-assisted diffs, they are not just making review harder. They are pushing reviewers toward heuristic approval. Does the code look organized? Are the tests green? Does the explanation sound competent? Those shortcuts were already common under deadline pressure. AI raises the danger because it produces exactly the kind of polished, plausible output that makes heuristic approval tempting.

The BBC's reporting on the Orchids platform adds a different layer of urgency. In that case, the issue was not only insecure output but the surrounding environment itself. A researcher demonstrated a compromise path on the vibe-coding platform that allowed hostile modification and system-level impact on a user's machine. That is a reminder that teams are now reviewing two trust surfaces at once: the code being generated and the tooling environment generating it. The approvals process cannot stop at the diff if the platform underneath the workflow is also part of the attack surface.

This is why the common advice to simply "debug more" feels insufficient. As CSO's coverage noted, even security experts are split on whether downstream debugging can keep up with AI-scale output. One camp says rigorous secure-development lifecycle controls, static analysis, dynamic testing, and code review can manage the risk. Another camp argues that the velocity of generated code turns after-the-fact review into a losing game unless security moves directly into the act of creation. The honest answer is probably both. You need stronger generation-time guardrails and a review process designed for higher throughput, because one without the other leaves a hole.

For practitioners, the takeaway is blunt. If your team is using AI to produce code, you need a new review contract. Require AI-use disclosure, not to shame people, but to surface how the change was produced and what was manually validated. Require plain-language architectural explanations. Require smaller PRs. Add targeted adversarial tests for authorization, workflow edge cases, and business-logic abuse, not just generic injection flaws. Treat a green unit test suite as table stakes, not evidence of correctness. And if a change is too large or too opaque for one reviewer to reason through, split it before merging it.

The market likes to talk about AI coding as a speed story. It is becoming a filter story. The scarce resource is no longer code generation. It is judgment. Which teams can reliably distinguish code that merely looks right from code that is right? Which teams have approval systems capable of handling machine-accelerated output without devolving into rubber-stamping? That is where competitive advantage is starting to move.

My view is that vibe coding will survive, and probably grow, not because it is safe by default but because the time-to-value benefits are too real to ignore. That means teams waiting for the trend to disappear are wasting time. The smarter move is to harden the human systems around it. The merge button is now one of the most important control points in modern software delivery. Treating it like a courtesy instead of a gate is how you end up with AI-scale mistakes.

Vibe coding did not invent sloppy approvals. It industrialized the consequences. The teams that adapt fastest will not be the ones with the most aggressive AI adoption. They will be the ones with the clearest standards for what evidence earns trust.

Sources: Kristopher Leads, CSO Online, BBC News

The merge button is the real bottleneck now

Small PRs are now a security control

Sign up for more like this.