Your Coding Agent Is Reading Files When It Should Be Querying a Graph — Codebase-Memory Changes the Equation

Your Coding Agent Is Reading Files When It Should Be Querying a Graph — Codebase-Memory Changes the Equation

The default pattern for coding agents exploring an unfamiliar codebase is expensive and surprisingly shallow. When an agent needs to understand what a function calls, where a class is used, or which modules depend on each other, it falls back on repeated file reads and grep searches — accumulating thousands of tokens per query without ever building a structural model of the codebase. It's the equivalent of re-reading a textbook from the beginning every time you need to look something up.

A new paper from the Codebase-Memory project proposes a different architecture. Instead of file exploration on demand, the system builds a persistent, Tree-Sitter-based knowledge graph from the repository and exposes it via MCP (Model Context Protocol), making it queryable from any MCP-compatible coding agent. The pipeline parses 66 programming languages through parallel worker pools, constructs call graphs, runs impact analysis, and identifies module communities — then serves the result as a structured, navigable graph rather than a flat file tree.

The efficiency gains in evaluation across 31 real-world repositories are substantial: Codebase-Memory achieves 83% answer quality compared to 92% for a traditional file-exploration agent — but does so at ten times fewer tokens and 2.1 times fewer tool calls. For graph-native queries like hub detection, caller ranking, and dependency tracing, it matches or exceeds the file explorer on 19 of 31 languages. The 9% quality gap is the cost of the efficiency gain, and for agents running long-horizon tasks or operating in cost-constrained environments, that trade-off is often worth making. Token costs and context window exhaustion are the primary practical bottlenecks on agentic coding today; this is the most actionable context-management architecture paper of the week.

Read the full paper on arXiv →