Gortex v0.30.0 Turns Agent Code Search Into Something You Can Benchmark

Gortex v0.30.0 Turns Agent Code Search Into Something You Can Benchmark

Agentic coding has a context problem, and the lazy answer has been to make the prompt window larger. Gortex v0.30.0 argues for a better answer: treat code search like production infrastructure, publish benchmarks, and make the retrieval layer prove that it found the right code before the model starts editing.

That sounds less cinematic than a new model launch. Good. The expensive part of coding agents is not always the generation step; it is the wandering. Agents grep for a symbol, read the wrong file, pull half the repository into context, miss the caller that matters, and then produce a plausible patch against an incomplete map. Bigger context windows reduce the immediate pain, but they also make it easier to hide bad retrieval inside more tokens. Gortex is useful because it puts numbers next to the map.

Context reduction needs a correctness denominator

Gortex indexes repositories into an in-memory knowledge graph and exposes that graph through CLI, MCP, HTTP, and a web UI. The project claims support for 15 coding-agent environments, including Claude Code, Cursor, Codex CLI, Gemini CLI, OpenCode, Aider, Kilo Code, Copilot/VS Code, and OpenClaw. Its README advertises 256 languages across tree-sitter, regex, and forest-backed tiers; 88 MCP tools; multi-repo workspaces; per-session workspace isolation; hybrid BM25/vector search; session memories; feedback-aware reranking; and graph-backed analysis for call chains, blast radius, unsafe patterns, routes, models, Kubernetes resources, and more.

That surface area is large enough to trigger the healthy kind of suspicion. A young project with 43 stars, 6 forks, 2 open issues, and 88 MCP tools is either moving fast or writing checks its maintainers will spend the next year cashing. The release’s best defense is not the feature list. It is the benchmark work in v0.30.0: reference-repo performance, token-efficiency comparisons against ripgrep+read, retrieval-baseline NDCG@10 harnesses, SWE-bench result templates, a user-facing gortex bench suite, and a gortex gain command for projecting token savings in dollars.

The benchmark document reports a useful paired metric: tokens spent and recall achieved. On identifier queries in the Gortex repo, rows like AddObservation, IsSymbolQuery, and FileCoherenceSignal show Gortex returning the ground-truth target at recall@2k = 1.00 while the ripgrep+full and ripgrep+context baselines often score 0.00. The token deltas are not subtle: AddObservation is listed at 972 tokens for Gortex versus 31,530 for rg+full and 9,020 for rg+ctx; IsSymbolQuery is 577 versus 23,027 and 7,388; FileCoherenceSignal is 151 versus 14,268 and 6,290.

The important part is not that Gortex wins its own benchmark. Vendors and open-source projects tend to do that. The important part is the shape of the claim. “Fewer tokens” by itself is a marketing number; returning nothing uses zero tokens. “Fewer tokens while still retrieving the expected target under a fixed budget” is the beginning of an engineering conversation. Teams can reproduce it, disagree with the ground truth, add their own queries, and wire the command into CI. That is how agent tooling graduates from demo to dependency.

The model is not the only thing consuming your budget

Gortex also ships a GCX1 wire-format scorecard claiming a median 27.4% token saving versus JSON, with 20/20 round-trip integrity across representative tool responses. This is the kind of detail that sounds like over-optimization until you run many agents all day. MCP tools are chatty. JSON field names, repeated keys, and verbose nested structures become prompt tokens. If a retrieval tool sits on the hot path for every investigation, the wire format is part of the product economics.

The daemon latency numbers tell the same story from the other side of the stack. The benchmark doc lists median p95 across tools at 5.5ms and median p99 at 5.9ms, with lightweight calls like get_callers and find_usages effectively sub-millisecond, while heavier calls such as get_repo_outline stretch into hundreds of milliseconds. That split matters. Agents can tolerate occasional heavy context-building, but they punish tools that make every turn feel like waiting on a remote database. Fast, scoped calls encourage agents to ask better questions instead of dumping files into context.

For practitioners, the takeaway is straightforward: evaluate agent code-search tools like databases, not like editor plugins. Run them on your real repository. Measure cold index time, warm query latency, incremental reindexing, disk footprint, recall at 2k and 10k token budgets, and behavior when a branch or worktree moves underneath the index. If the tool claims multi-repo intelligence, test same-named symbols across repositories. If it claims provenance, make the agent cite the source file, symbol, edge type, and confidence. If it exposes MCP, test response size and failure envelopes under concurrent agent sessions.

There is also a governance angle. Gortex emphasizes per-session workspace isolation in the daemon, confidence-gated cross-repo edges, signed releases, SHA256 verification, optional cosign, an SLSA Level 3 badge, and a VirusTotal 0/91 badge. None of that guarantees safety, but it points at the right threat model. A code-search layer for agents is not passive documentation. It can steer edits, expand blast radius, and connect repositories that humans expected to stay mentally separate. Bad edges become bad diffs.

The open question is maintenance discipline. Eighty-eight MCP tools across 256 languages creates a lot of compatibility surface. Tree-sitter grammars drift. LSP behavior varies. Language-specific resolution is full of edge cases. Natural-language search can look impressive while missing the one boring internal wrapper that matters. Gortex’s answer should continue to be benchmarks, not vibes: reproducible corpora, curated ground truth, budget gates, false-positive reporting, and public regressions when the numbers move.

That is why this release matters despite the small repo footprint. The next useful coding-agent upgrade may not be a model that writes prettier code. It may be a retrieval system that can prove it found the right code, spent fewer tokens doing it, and gave the reviewer enough provenance to trust the route. Agents do not need a bigger haystack. They need a better map, with measurements printed on it.

Sources: GitHub — Gortex v0.30.0, Gortex README, Gortex benchmarks, GCX1 wire-format docs, HN discussion on agent code search, GitHub Copilot agent skills docs