Every Layer in Your AI Coding Stack Has a Maintenance Cost, and Teams Are Finally Saying It Out Loud
The AI coding market spent the last year rewarding the most elaborate demo on stage. One model plans, another reviews, a third agent fans out work, and somewhere off-camera there is a folder full of prompt files that only one staff engineer understands. Jan Hegewald's new essay is useful precisely because it pokes at the part of this ecosystem people usually skip in public: every extra layer in the stack has a carrying cost, and eventually someone besides the person who built it has to carry it.
That sounds obvious. It is not how the market behaves. Tool vendors sell the dream of accumulation. Add custom instructions. Add specialized agents. Add skills. Add orchestration. Add subagents. Add review loops. Add more context. Add a memory layer. Add a browser agent. Add another model for second opinions. Each move is defensible in isolation. That is the trap.
Hegewald walks through that progression with unusual honesty. His team started with reusable instruction files because explaining the codebase from scratch in every session was wasteful. Then those files became bloated. By his telling, what began as always-relevant guardrails gradually turned into mini onboarding manuals, loading thousands of words of context even for simple tasks. That is a pattern nearly every serious coding-agent user has seen by now: yesterday's helpful context becomes today's always-on noise.
The market loves sophistication. Teams have to live with it.
From there, the story keeps escalating in familiar ways. Prompt files gave way to custom agents with bounded responsibilities. Skills emerged as a better place to store knowledge that mattered sometimes, but not always. The result, in Hegewald's framework, is a stack where instructions are always loaded, agents run against a goal, and skills load just in time when the work warrants them. On paper, that is clean. In practice, it means yet another conceptual layer, another set of files, another thing a colleague has to understand before they can trust the workflow.
This is why the essay matters beyond one consultant's setup. It captures a phase change now happening across agentic coding. The first year of these tools was about proving capability. The second year is about maintainability. Once a workflow moves from solo experimentation to team habit, elegance stops being the main question. Survivability becomes the question. If the staff engineer who wired the whole thing together goes on vacation, switches projects, or leaves the company, does the system still make sense?
That is not a soft management concern. It is an engineering concern. A brittle internal agent harness is still a brittle internal system. Martin Fowler's recent writing on harness engineering points at the same truth from a different angle. A good outer harness should do two things: improve the odds that the agent gets the change right on the first pass, and provide feedback loops that help it self-correct before a human even has to look. The operative phrase is reduce review toil. If your elaborate setup increases cognitive overhead for everyone except its author, it is not really a productivity system. It is bespoke infrastructure with better branding.
The subagent boom makes this especially relevant. Gemini CLI's subagent model is a good example of the promise. In the product pitch, specialists keep the main session clean, operate with restricted tools, and use separate context windows so the primary agent does not drown in irrelevant detail. That is real value. Focused context and bounded tool access are exactly the kinds of controls serious teams should want. But every specialist also adds a new operational concept. Someone has to know when it fires, what it can touch, how it fails, and whether its output is trustworthy. Multiply that by five or ten specialists and you have quietly created an internal platform team problem.
Context is cheap until it becomes policy
The most useful correction in Hegewald's piece is that complexity usually arrives wearing the costume of obvious progress. Nobody wakes up and decides to over-engineer their AI workflow. They patch a real pain point. Then tooling improves, so they replace the patch with an official mechanism. Then they discover another gap and fill that too. Six months later, they have a multi-layer choreography whose original rationale is scattered across Markdown files, system prompts, agent definitions, and half-remembered conventions.
That matters because the coding-agent industry is starting to confuse optional sophistication with maturity. The mature workflow is not the one with the most moving parts. It is the one that makes tradeoffs explicit. Hegewald's three-question test should probably become standard operating procedure for teams evaluating new AI workflow features: what problem does this solve today, what do we lose if we wait six months, and who else must understand it for the capability to survive? That is the sort of spec sheet the current market tends to skip because it is much less cinematic than a multi-agent demo.
There is also a token-economics angle here that gets underrated. Always-on context, oversized instruction files, and ritualized planning steps are not just maintainability costs. They are cost costs. They increase latency, inflate usage, and sometimes degrade output by making the model reason through irrelevant material. The industry's reflex has often been to solve this by buying a larger context window. That can help, but it is not the same thing as disciplined context design. Bigger context is not free intelligence. Sometimes it is just a bigger room to get lost in.
For practitioners, the practical takeaway is not to avoid advanced setups. It is to earn them. Start with the minimum harness that meaningfully improves output quality. Keep instructions lean and always relevant. Create a specialist agent only when a task pattern shows up often enough, and distinctly enough, to justify one. Create a skill only when the knowledge truly benefits from conditional loading. Add orchestration only when the coordination cost of manual delegation is clearly higher than the management cost of the orchestrator. In other words, build AI workflow layers the same way competent teams build software abstractions: reluctantly, with receipts.
The larger editorial point is that agentic coding is finally entering the phase where boring engineering questions are more important than breathtaking demos. Who maintains the harness? Who audits it? Which layers are actually reducing review toil? Which ones are just moving the toil into more confusing forms? Those are healthier questions than whether the agent can open more tabs or spawn more helpers.
My take is simple. The next real productivity win in AI coding probably comes from subtraction, not addition. Teams that can say no to unnecessary workflow machinery will move faster than teams that keep treating every new capability as mandatory modernization. A coding stack that one normal engineer can understand is not less advanced than a stack that requires an internal priesthood. In practice, it is more likely to ship.
Sources: Machine Thoughts, Martin Fowler, Gemini CLI documentation