304,362 AI Commits, One Uncomfortable Finding: AI-Generated Code Accumulates Technical Debt Faster Than It's Cleaned Up

304,362 AI Commits, One Uncomfortable Finding: AI-Generated Code Accumulates Technical Debt Faster Than It's Cleaned Up

Most research on AI-generated code quality operates under controlled conditions — short-lived experiments, synthetic benchmarks, carefully curated prompts. What happens in production repositories over time is a different question, and until now it's been largely unanswered. A new large-scale empirical study changes that by tracking 304,362 verified AI-authored commits from 6,275 GitHub repositories, spanning five widely used AI coding assistants, and measuring what actually happens to the issues those commits introduce.

The methodology is precise: static analysis runs before and after each AI commit to attribute exactly which code smells, bugs, and security issues the AI introduced. Each issue is then tracked from its introducing commit to the latest repository revision, classifying it as quickly remediated, dormant debt, or propagating — spreading as other code builds on top of the AI-generated foundation. The findings are uncomfortable for teams relying on code review as their quality gate. AI-introduced issues accumulate differently than human-introduced issues, with a systematic pattern of persistence that depends on which AI assistant was used, whether the task was a bug fix or feature addition, and the repository's existing review culture. Technical debt from AI coding assistants is not self-healing.

The practical implication is that review rate alone doesn't prevent debt accumulation — the persistence and propagation patterns are the key risk, and they vary significantly by task type and tool. For any team running AI coding agents at scale: the lifecycle tracking methodology documented in this paper is directly extractable as an internal audit approach, and the per-assistant breakdown makes it possible to calibrate how much extra scrutiny different tools warrant on different task types.

Read the full paper on arXiv →