vibe-coding

The Evolution of Tool Use in LLM Agents: From Single-Tool Call to Multi-Tool Orchestration

Anatoliy Kolodkin

25 Mar 2026 • 1 min read

A comprehensive new survey traces the full arc of how LLM agents have learned to use tools — from early experiments asking "did the model pick the right function?" all the way to today's multi-agent systems orchestrating parallel tool calls across long, stateful trajectories. The paper establishes a five-level taxonomy of tool use complexity: single-tool selection, sequential multi-tool calling, parallel execution, nested hierarchical orchestration with sub-agents, and adaptive orchestration that revises its plan based on live tool outputs and environmental changes.

The central insight is that the field's core problem has fundamentally shifted. Early work focused on invocation correctness — whether the model called the right tool at all. Today's challenge is orchestration correctness: whether the agent reasons soundly across a long trajectory with incomplete information, side effects, and compounding decisions. Familiar architectural patterns like ReAct, Plan-and-Execute, and Reflection are situated within this taxonomy, giving practitioners a clearer sense of which pattern addresses which class of problem.

The survey also treats safety, cost, and verifiability as first-class design constraints — concerns that only become urgent at levels four and five, but which need to be designed in from the start. For teams hitting reliability walls with their coding agents, the taxonomy offers something genuinely useful: a shared vocabulary for diagnosing which level of orchestration complexity you're operating at, and what the known failure modes look like at that level.

Read the full article at arXiv (cs.SE + cs.CL) →

Sign up for more like this.