agentic-coding

Doby Is a Small, Sharp Rebuttal to the Idea That AI Coding Needs More Context Instead of Better Navigation

Anatoliy Kolodkin

24 Apr 2026 • 4 min read

There is a lazy answer to most AI-coding quality problems right now: give the model more context. Bigger windows, more retrieval, more MCP tools, more documents, more screenshots, more everything. Sometimes that works. Sometimes it just creates a larger room for the model to get lost in before it edits the wrong files with confidence. doby, a same-day GitHub launch, is interesting because it argues for the opposite instinct. Maybe the problem is not that coding agents see too little. Maybe it is that they navigate badly and change code before intent is pinned down.

The repo’s pitch is compact and sharp. Build a lightweight structured index that maps keyword to plan doc to code file to symbol, then force code changes through a spec-first workflow. The README claims ordinary LLM-driven modification work can burn 2,000 to 5,000 tokens just locating the right files, while doby compresses the common path to two grep calls and roughly 100 tokens. More importantly, it claims the larger savings come from reducing rework. If the model updates the spec before touching the implementation, verifies architecture first, and only then edits code, you stop paying for a lot of expensive wandering.

There is plenty of repository bravado in the numbers, including a stated 90 to 95 percent token reduction on code changes in the happy path. Those percentages should be treated as marketing until more teams validate them. But the architecture itself is the part worth taking seriously. doby divides retrieval into layers: an L1 flat-line index for the common path at about 100 tokens, L2 wiki pages for broader explanation, L3 semantic retrieval for misses, and L4 auto-compile for expensive synthesis. The key design choice is that most work is supposed to stay in L1. That is a useful correction to a market that keeps assuming the answer is to invoke the most expensive reasoning path by default.

Developers have been describing recent AI-coding frustration with terms like over-editing, wandering, touching too much surface area, and fixing the wrong layer. Those complaints often get framed as pure model weakness. Sometimes they are. But a lot of bad edits are just downstream of poor navigation. If the system cannot get to the right doc, file, symbol, or ownership boundary quickly, then it starts approximating. Approximation is how one bug fix turns into six plausible changes across unrelated files, followed by a reviewer asking why the model apparently got bored and refactored half the module.

doby is compelling because it treats that as an information-architecture problem rather than a mystical intelligence problem. Keyword to plan doc to code file to symbol is not glamorous. It is also exactly the kind of boring scaffolding that keeps work narrow. The project’s “read 0 principle” is particularly telling. Resolve and update operations are meant to work from grep output, not from repeatedly rereading giant index files. That sounds like token optimization, but the bigger win is behavioral. The tighter the route to the relevant locus of change, the less temptation there is for the model to improvise its way through the repo.

This lines up with Martin Fowler’s recent writing on harness engineering for coding-agent users. Fowler’s core claim is that a good outer harness should increase first-pass correctness and create feedback loops that catch issues before humans have to do the cleanup. In that framing, doby is a local, repo-centered attempt to move quality left. Spec-first rules are feedforward control. Indexes, mappings, and later sync checks are feedback control. That is a more serious idea than “here is another helper around Claude Code.”

The spec-first part matters more than the token-savings pitch

The repo’s strongest product decision is the workflow itself. When a change request comes in, it resolves the relevant docs and code, asks for the spec to be reviewed and updated first, loops through feasibility and architecture verification, implements, runs integration checks, then syncs the indexes. That sounds almost old-fashioned next to today’s “just vibe it” rhetoric. Good. Most of the current pain in AI coding is not that agents cannot type code. It is that they can type code before anyone has agreed what “correct” means.

For practitioners, this is the useful lesson. If you want narrower, more reliable AI-assisted changes, the highest-leverage intervention may not be a better frontier model. It may be forcing the system to bind itself to the intended spec before implementation begins. That is also why this repo is adjacent to the over-editing SEO cluster even if it is not named that way directly. Over-editing is often a symptom of missing or weak intent control. A spec-first workflow attacks that root cause instead of merely cleaning up after it.

There is a second-order industry point here too. The coding-agent market is starting to split into two philosophical camps. One camp wants maximal autonomy, larger context, more tools, and longer runs. The other wants tighter harnesses, stronger constraints, and more disciplined search. Those are not mutually exclusive, but they do point to different products. The first camp produces better demos. The second may produce better day-to-day engineering outcomes.

What teams should do next

You do not need to adopt doby specifically to steal the right ideas from it. Start by measuring where your AI coding sessions are actually wasting time. Is it generation, or is it navigation? How often does the system open the wrong files before it finds the right ones? How often do edits expand because the original request was not grounded in an explicit spec? How often does review pain come from incorrect logic versus excessive surface area touched?

Then tighten the workflow. Maintain a lightweight map from features to docs to code ownership. Require intent to be expressed in one canonical place before implementation. Make “which symbol should change?” a first-class question, not an incidental discovery process buried in the session transcript. If you have a repo where architecture matters, treat retrieval discipline as part of your engineering system, not as optional prompt seasoning.

The more opinionated read is this: the next quality jump in AI coding may come from less ambition, not more. Smaller search spaces. Better indexing. Stronger specification discipline. Cleaner routes to the exact file and symbol that matter. That is a much less cinematic story than autonomous software engineering conquering the monorepo. It is also much closer to how reliable software gets built. doby deserves attention because it sees the same thing many teams are starting to relearn the hard way: better navigation is often more valuable than more context.

Sources: doby on GitHub, Martin Fowler on harness engineering

Over-editing is usually a navigation failure wearing a code-quality costume

The spec-first part matters more than the token-savings pitch

What teams should do next

Sign up for more like this.