Less Training Data, Better Agents: The "Less-Is-More" Principle Reaches Software Engineering

Less Training Data, Better Agents: The "Less-Is-More" Principle Reaches Software Engineering

Training effective software engineering agents requires large volumes of high-quality task trajectories, and generating them at scale is expensive. A new paper extends the "Less-Is-More" (LIMO) hypothesis — which showed in mathematical reasoning that a small number of excellent examples dramatically outperforms large volumes of mediocre ones — to the domain of agentic software engineering, with a practical twist that changes what teams should optimize when building coding agents.

The key finding: for agentic scenarios, the quality criterion that matters most for trajectory selection is not whether the agent arrived at the correct final answer, but how it got there. Trajectories where the agent plans explicitly, uses tools purposefully, and backtracks when stuck outperform trajectories where the agent stumbles onto a correct answer through unstructured exploration — even when both produce the same final output. A scoring function based on these structural properties, applied as a filter on a full trajectory set, shows that training on the top 15–20% of curated trajectories matches or exceeds training on the complete dataset across SWE-bench and coding agent benchmarks.

The compute implications are significant: better coding agents don't require more rollouts, they require more selective curation. For teams fine-tuning their own models or evaluating off-the-shelf agents, this shifts the priority from maximizing rollout volume to identifying trajectories that exhibit the right agent behavior — and filtering everything else out.

Read the full article at arXiv →