agentic-coding

Reversa Treats Legacy Codebases as a Spec Reconstruction Problem, Which Is Closer to Reality Than Most Agent Demos

Anatoliy Kolodkin

26 Apr 2026 • 5 min read

The clean greenfield demo has done real damage to how people think about AI coding. Vendors keep showing the same happy path: fresh repo, crisp prompt, tidy task, model writes code, everyone claps. Real software work is usually the opposite. You inherit a codebase with missing docs, stray business logic, half-remembered architecture decisions, and a team that knows just enough to be worried. Before an agent can improve that system, it has to understand what it is looking at. That understanding step is where most of the market is still embarrassingly weak.

Reversa, a GitHub project created on April 26, is worth paying attention to because it starts from that uncomfortable truth instead of pretending legacy code is a rounding error. The repo describes itself as a reverse-engineering framework for specifications. Install it into an existing project, invoke a central Maestro agent, and it coordinates a five-phase pipeline to analyze the codebase and produce structured output under _reversa_sdd/. The artifact list is unapologetically serious: architecture docs, C4 diagrams, ERDs, data dictionaries, state machines, permission matrices, ADRs, flowcharts, and traceability documents. That is not another “ship faster” pitch. It is a spec-recovery harness.

The project’s core thesis is exactly right. AI agents are good at creating software from specifications, but most production systems do not come with reliable specifications. The real knowledge lives in conditionals, migration files, old commits, environment assumptions, and the scar tissue in people’s heads. If you skip that reality and point a coding agent directly at the code, you are not automating engineering. You are accelerating guesswork.

Reversa is explicit about the workflow. The Maestro orchestrates Reconhecimento, Escavação, Interpretação, Geração, and Revisão. Specialized roles include Scout, Arqueólogo, Detetive, Arquiteto, Redator, and a Devil’s Reviewer, with optional agents like Tracer, Visor, Data Master, and Design System. That could sound theatrical, but the underlying decomposition is sensible. Someone has to map the repo surface, someone has to dig into modules, someone has to infer business rules, someone has to synthesize architecture, and someone has to stress-test the generated story. The more interesting design choice is not the anthropomorphic naming. It is the disciplined split between discovery and synthesis.

The confidence labels are the smartest part

Plenty of tools can generate beautiful documentation that says false things confidently. Reversa’s most mature idea may be its confidence contract. Generated statements are labeled as confirmed, inferred, or gap. That is a small feature with large consequences. Teams do not just need more docs. They need docs that tell them what the code proves, what the model merely deduced, and what still needs a human answer. Without that distinction, auto-generated specs become polished fiction, which is worse than missing documentation because it carries false authority.

This is where Reversa feels more grounded than a lot of agent tooling. It recognizes that reverse engineering is epistemic work. The challenge is not simply summarizing files. It is reconstructing intent from artifacts that may be inconsistent, outdated, or incomplete. A permission matrix, for example, is not just a list of roles. It is an operational claim about who can do what under which conditions. If the system behavior is split across controllers, feature flags, middleware, and third-party integrations, the only honest output is one that marks uncertainty clearly.

The repo also claims strict write boundaries, restricting agent output to .reversa/ and _reversa_sdd/ rather than mutating the existing project. That matters. Legacy-system analysis is one of the easiest places for “helpful” agents to become destructive. A reverse-engineering harness that edits the production code while it is still trying to understand it is confusing the map with the territory. Keeping analysis artifacts separate is not just safer. It makes the entire process auditable.

There is a larger category shift hiding here. The first year of coding agents was obsessed with code generation. The more economically important next phase may be code comprehension at organizational scale. Enterprises are not sitting on infinite greenfield opportunity. They are sitting on piles of systems that nobody wants to touch because the cost of understanding them is high and the cost of breaking them is higher. A tool that reduces that understanding tax can unlock work that otherwise stalls for quarters.

Legacy modernization starts with recovered intent

This is also why Reversa lines up with the best recent thinking around harness engineering. Martin Fowler’s argument has been that the outer system around the model should improve correctness, constrain behavior, and create useful feedback loops. Specification recovery is a perfect use case for that philosophy. You do not want raw autonomy. You want systematic exploration, explicit outputs, and human-review checkpoints where uncertainty is visible instead of hidden behind nice prose.

There is strong practitioner value here if the project matures. Teams considering AI-assisted modernization should stop asking only “which model writes the best patch?” and start asking “how will we reconstruct the system’s actual behavior before we ask a model to change it?” That means building inventories, surfacing data models, documenting hidden assumptions, and preserving traceability from code to spec. Reversa’s output tree, including traceability matrices and gap reports, is directionally correct because it turns unknowns into explicit work items rather than silent failure modes.

The caution is obvious. Reverse engineering is hard, and multi-agent pageantry can make hard things look solved before they are. Large legacy systems are full of runtime behavior the code alone does not reveal: environment coupling, bad but important data, cron weirdness, undocumented integrations, and user workflows that exist only because nobody killed them yet. Reversa partly acknowledges this with its Tracer role and gap labeling, but teams should still treat generated specs as a starting point for recovery, not a final truth source.

There is also a productization challenge. A framework like this has to earn trust across languages, repo sizes, and organizational habits. It has to handle partial runs, resumability, documentation drift, and the subtle distinction between “we found the rule” and “we found one implementation of the rule.” If it cannot maintain that discipline, it risks becoming another impressive but brittle analysis tool.

Still, the market needs more projects pointed in this direction. The path to useful agentic coding in enterprises probably does not run through ever more theatrical demos of autonomous greenfield generation. It runs through better ways to recover system intent from the software companies already depend on. That is where budget, risk, and engineering pain actually live.

Practically, engineering teams should take three lessons from this. First, do not let an agent make large changes to a legacy system before you have some formalized understanding layer, whether that is Reversa or a homegrown equivalent. Second, require uncertainty labeling in generated docs. If a tool cannot tell you what it inferred versus what it verified, it is asking for trust it has not earned. Third, treat recovered specifications as infrastructure for future agent runs. The best documentation output is not the PDF nobody reads. It is the structured context that makes the next change safer.

The smartest thing about Reversa is not that it promises to make old systems legible. It is that it admits legibility itself is the hard part. That is a much more believable story than most of what agentic coding has been selling lately.

Sources: sandeco/reversa, Martin Fowler on harness engineering, Anthropic Managed Agents, SWE-bench

The confidence labels are the smartest part

Legacy modernization starts with recovered intent

Sign up for more like this.