Meta's Prompt Template That Lets Agents Review Code at 93% Accuracy — Without Running It
Meta researchers have developed a structured prompting technique — called semi-formal reasoning — that enables LLMs to verify code patches without executing them, reaching up to 93% accuracy on real-world agent-generated patches. The technique fills a practical gap between free-form chain-of-thought reasoning (flexible but prone to hallucinations) and rigid formal verification (precise but requiring domain-specific tooling that most teams don't have). It works by forcing the model through a structured template that requires explicit premise enumeration, traced code path analysis, and conclusions derived only from that evidence — functioning as a "reasoning certificate" rather than a freeform answer.
The results are evaluated across three tasks: patch equivalence verification (88% on curated examples, 93% on real-world agent-generated patches), fault localization (Top-5 accuracy up 5 percentage points), and code question answering (87%, up 9 percentage points). The 93% figure on agent-generated patches is especially relevant because agent-produced code is messier and less predictable than curated test cases — and the technique holds up on that harder distribution. The broader implication is a potential cost-model shift for code review loops: execution-free verification via structured prompting could reduce or replace expensive sandbox execution for many validation scenarios.
The structured template approach is directly adoptable today — unlike techniques requiring fine-tuning or new model training, this is a prompt design change any team can test against their existing code review agent. The core pattern: explicit premises, traced code paths, conclusions derived only from enumerated evidence.