agentic-coding

Agentic Engineering Isn't Vibe Coding — And Pretending Otherwise Wrecks Brownfield Codebases

Anatoliy Kolodkin

02 May 2026 • 6 min read

There's a particular kind of engineering debt that doesn't show up in your code review metrics. It's the debt that accumulates when your agent generates code that looks right in isolation and is quietly wrong in context — the abstraction that breaks three layers away, the API contract that doesn't match the one your team actually built, the edge case that a human would have caught because they understood the system and the agent didn't. This is the debt that vibe coding leaves behind, and it's why the conversation Andrej Karpathy started at Sequoia AI Ascent 2026 matters more than any single model release or benchmark update.

The talk's core argument — that vibe coding raised the floor for non-developers while agentic engineering is about preserving the quality ceiling for professionals — is being absorbed by the industry in pieces. But the piece that keeps getting left out is the second half of that argument: what agentic engineering actually requires of the people doing it. It's not prompting fluency. It's not model selection. It's context discipline.

The Vocabulary We Needed

A post from Agentic Insights this week does the job of translating Karpathy's framework into something engineers can actually use day-to-day. The key distinction the author draws: vibe coding and agentic engineering aren't two points on a spectrum — they're different jobs that require different skills, and conflating them is how teams end up with brownfield codebases that are harder to maintain than before they brought in the AI tools.

Vibe coding, properly understood, is a legitimate workflow. You describe what you want, the model builds it, you get something working. For non-developers building prototypes, for developers building greenfield experiments, for anyone who needs to go fast on something that doesn't need to last — it's genuinely powerful. The floor went up. That's real.

Agentic engineering is something else. You have a 100,000-line codebase with five years of accumulated decisions, three API surface versions, and institutional knowledge that lives in the heads of engineers who've since left. The agent doesn't replace your judgment — it amplifies it. But amplification only works if there's something to amplify. The spec that tells the agent what to build. The context that survives a /clear. The taste to read 4,000 lines of generated code and spot the awkward abstraction before it ships.

Karpathy put it bluntly around the 28-minute mark: "You can outsource your thinking, but you can't outsource your understanding." That line is doing more work than it looks like. It's not a warning against AI coding tools. It's a description of the job that remains when you've automated everything that can be automated. The agent handles implementation. The engineer handles context architecture, decision tracking, and quality verification. That's not less work than before — it's different work, and it's harder to fake.

Where Brownfield Breaks

The brownfield problem is the one the hype cycle keeps skipping over. Vibe coding works great on greenfield projects. You describe a feature, the model builds it, you ship it. No accumulated decisions to navigate, no implicit contracts to respect, no technical debt to work around. The model sees a clean surface and produces clean output.

Brownfield is different. A model can generate code that is technically correct — it compiles, it passes its unit tests, it does what the docstring says it does — and is still wrong in the context of the system it lives in. It calls an API endpoint that the team's deployment pipeline deprecates next quarter. It assumes a data structure that the next refactor will change. It uses an authentication pattern that doesn't match the one the security team standardized last month. These are not bugs the model can catch by reading more code. They are assumptions that can only be verified by someone who knows the system.

The practical consequence: teams that went all-in on vibe coding for greenfield work and then tried to extend it to their existing codebases are discovering that the generated code doesn't integrate cleanly. More time is spent correcting integration failures than would have been spent writing the code manually. The productivity promise — build faster with AI — inverts into a maintenance burden that didn't exist before.

This is the pattern showing up in Hacker News threads with 800+ points. "After two years of vibecoding, I'm back to writing by hand." "The cult of vibe coding is dogfooding run amok." The common thread isn't that AI coding tools are bad. It's that treating AI as a replacement for context knowledge — rather than an amplifier of it — produces codebases that are harder to maintain than the ones they replaced.

The Context Infrastructure That Makes Agentic Engineering Work

The Agentic Insights piece points to a specific set of practices that separate teams doing agentic engineering from teams that are vibe coding with better tools. They're not glamorous. They don't generate conference talks or Twitter threads. But they're what makes the difference between an agent that produces durable code and one that produces code that looks good for a week.

First: the spec. Not a README, not a collection of inline comments — a structured specification that tells the agent what to build, not just how to behave. Karpathy's framing from the talk is worth quoting directly: "People have to be in charge of this spec, this plan. Work with your agent to design a spec that is very detailed." The spec isn't handed down from on high. It's built collaboratively, and the agent's questions about ambiguities are inputs to it, not annoyances to work around.

Second: context that survives the session. A normal chat session — whether it's Claude Code, Copilot, or anything else — resets when you /clear or start a new thread. The knowledge of what the agent built last week, what decisions were made, what the system actually looks like — all of that is gone. Teams doing agentic engineering maintain externalized context: a CLAUDE.md, a FEEDBACK.md, a context repository that the agent reads at the start of each session. This is what the Codebase Context Specification (codebasecontext.org) was trying to formalize a year before Karpathy gave the role a name.

Third: the discipline to review generated code, not just apply it. Karpathy admitted in the talk that code quality still gives him "a heart attack" — bloaty, lots of copy-paste, awkward abstractions that are brittle. The part of the job nobody posts about on Twitter. The agent generates; the engineer verifies. That verification step is not optional, and it's not fulfilled by running the tests. It's fulfilled by reading the code with the same rigor you'd apply to a code review from a colleague you didn't entirely trust. Which is to say: carefully.

The Hiring Signal Karpathy Dropped

Around the 18-minute mark of the Sequoia talk, Karpathy offered a piece of hiring advice that deserves more attention than it's getting: stop giving puzzles. The interview should be: write a Twitter clone for agents and make it really good.

This is a test of the actual skills that matter in agentic engineering. Context architecture — structuring a codebase so an agent can navigate it. Taste — recognizing when an abstraction is awkward before it ships. Verification rigor — catching the generated code that looks right but is subtly wrong. Judgment — knowing when to trust the agent versus when to dig in and read the implementation.

These are not skills that whiteboard algorithms test. They're not skills that LeetCode hard tests. They're skills that emerge from having maintained a complex codebase, having shipped features that had to integrate with existing systems, having debugged production issues that came from integration failures no unit test would have caught. The teams that figure out how to hire for these skills before their competitors will have a meaningful productivity and quality advantage — not because they have better agents, but because they have engineers who know how to use them.

The Take

The shift from vibe coding to agentic engineering isn't a product cycle update. It's a discipline shift that requires a different mental model for how humans and AI agents should interact on complex projects. Vibe coding — describe what you want, let the model handle implementation — works for greenfield and prototypes. It falls apart on brownfield codebases where context is the constraint, not intelligence.

The actionable insight for builders: if you're still treating AI coding tools as "describe and trust," start building the context infrastructure that lets you describe less and verify more. A spec that the agent can reference. Context that survives sessions. The habit of reading generated code with the same rigor you'd apply to a junior engineer's first pull request. The gap between vibe coders and agentic engineers is widening, and the brownfield maintenance problem is where the difference becomes visible. Build the infrastructure now, before the debt accumulates past the point where it's fixable.

Sources: Agentic Insights, Karpathy Sequoia AI Ascent 2026, CodebaseContext.org

The Vocabulary We Needed

Where Brownfield Breaks

The Context Infrastructure That Makes Agentic Engineering Work

The Hiring Signal Karpathy Dropped

The Take

Sign up for more like this.