Agents Write Safer New Code Than Humans — But Maintaining Existing Code Is Where They Break Things

Agents Write Safer New Code Than Humans — But Maintaining Existing Code Is Where They Break Things

Coding agents are often described as risky for production codebases, but the risk is not uniformly distributed. A new comparative study of 7,191 agent-generated pull requests and 1,402 human-authored PRs from Python repositories in the AIDev dataset finds a counter-intuitive split: coding agents introduce fewer breaking changes than humans when building new code, but show a systematically higher rate of breaking changes on maintenance tasks — bug fixes, refactoring, and API evolution.

The mechanism is structural. When agents build from scratch, they control the entire interface surface and can enforce consistency throughout. When agents modify existing code, they make local changes that satisfy the immediate task without modeling downstream dependencies — and that's where things break. The pattern holds across agent families and task types, confirmed through AST-based change analysis that can attribute exactly which modifications violate backward compatibility. A second finding adds practical leverage: the task context framing in the PR description significantly predicts breaking change rate. Agents given maintenance-framed prompts break things more than agents given feature-framed prompts, even for identical underlying changes.

The immediate takeaway for teams running coding agents on production codebases is concrete: use agents more confidently for greenfield work and new feature additions, and apply heavier review scrutiny to agent PRs on existing APIs, refactoring tasks, and bug fixes. The task-framing effect is also directly usable — how you write the prompt for a maintenance task measurably affects how carefully the agent models backward compatibility. This is one of the cleaner examples of empirical research that translates directly into process changes.

Read the full paper on arXiv →