Strong Manager + Cheap Worker = Top-Model Performance at a Fraction of the Cost

Strong Manager + Cheap Worker = Top-Model Performance at a Fraction of the Cost

What happens when you pair an expensive reasoning model with a cheap execution model to solve software engineering tasks? A new paper studying the ManagerWorker architecture provides the clearest empirical answer yet — and the headline finding is immediately actionable. A strong manager directing a weak worker achieves a 62% resolve rate on SWE-bench Lite, matching a strong solo agent at roughly 60%, while spending a fraction of the expensive model's tokens on execution. The expensive reasoning goes where it matters most: analysis, planning, and reviewing direction. The cheap model handles the mechanical work of reading files and writing diffs.

The corollary finding is equally important and worth printing on every multi-agent architecture diagram: a weak manager directing a weak worker — 42% — underperforms a single weak agent running alone at 44%. The paper's framing for this is exact: "structure without substance is pure overhead." An orchestration layer that doesn't add genuine planning capability doesn't just fail to help; it actively gets in the way. The study also tests pipeline complexity directly, finding that adding more coordination steps past a certain threshold reduces performance by fragmenting coherent analysis across too many round trips. The manager's value is in directing — not merely reviewing after the fact — and a minimal review-only configuration adds nothing measurable.

For engineering teams deciding where to allocate expensive model tokens, this paper provides direct evidence for spending them on planning rather than execution. The manager-worker pattern only pays off when the capability gap between the two roles is genuine and the coordination overhead is kept tight. Teams spinning up multi-agent frameworks because the architecture sounds sophisticated should read this paper first: the 42% vs 44% result is a concrete warning that adding an under-powered orchestration layer to a single capable agent will cost you performance, not gain it.

Read the full article at arXiv →