Your AI Coding Agent Writes Like a Beginner 90% of the Time — What That Actually Means for Code Review

Your AI Coding Agent Writes Like a Beginner 90% of the Time — What That Actually Means for Code Review

As AI coding agents take on more of the actual code-writing work, human developers increasingly shift into the reviewer role. That shift raises a practical question that hasn't had a data-grounded answer: what skill level does a developer actually need to review AI-generated code? New research using the AIDev dataset — 591 real pull requests containing 5,027 Python files from three distinct AI agents — gives a clear one.

The study applied pycefr, a static analysis tool that maps Python constructs to six proficiency levels, to the full dataset. The result: over 90% of constructs in AI-generated code fall at the Basic level (A1/A2), with less than 1% reaching Mastery (C2). The distribution broadly mirrors human-authored code, but AI agents skew toward the simpler end. Advanced constructs do appear, but they cluster around specific task types — feature additions and bug fixes — while scaffolding and maintenance code stays consistently basic.

The practical implication is a recalibration of where review effort should go. If the vast majority of agent-generated Python is structurally simple, the review bottleneck isn't catching subtle advanced-construct misuse — it's catching architectural drift, logical errors in straightforward loops, and semantic misalignment with the codebase's established conventions. Teams optimizing their review process for edge-case complexity are spending attention in the wrong place.

This also reframes the skill requirements for developers reviewing agent output. A mid-level engineer who understands the codebase architecture and conventions is often better positioned to catch what matters than a senior engineer who reviews for advanced code patterns that rarely appear.

Read the full paper on arXiv →