claude-code

Agentic Fix Turns Pentest Findings Into PRs — Now Comes the Hard Part

Anatoliy Kolodkin

26 May 2026 • 5 min read

Security vendors have spent years trying to get developers to read vulnerability dashboards. Novee’s Agentic Fix makes a more realistic bet: developers are already living inside GitHub and coding agents, so the security workflow has to meet them there. The interesting part is not that an AI tool can write a patch. We have enough evidence that models can produce plausible diffs. The useful part is the loop around the patch: a validated exploit becomes a concrete issue, the coding assistant gets the exploit-path context, the fix becomes a pull request, and the original asset is re-tested after merge.

That is a much better shape than the usual “AI fixes vulnerabilities” pitch. Security remediation has never been blocked only by code generation. It is blocked by lost context, ambiguous tickets, ownership gaps, review risk, and the tedious retest cycle that proves whether the change fixed the bug rather than merely silenced the report. Agentic Fix is worth watching because it treats the coding agent as one component in a remediation pipeline, not as the judge, jury, and merge button.

The ticket is where security context usually goes to die

According to SiliconANGLE and Novee’s own launch post, Agentic Fix starts from Novee’s autonomous pentesting platform. Once Novee validates a finding, the user can click “Fix with” and choose Claude, Codex, Copilot, Cursor, or Devin. Novee then creates a GitHub issue containing remediation guidance, including the entry point, affected code paths, attack vector, and specific details derived from the exploit path. The selected coding assistant uses that context to generate a patch and open a PR. After merge, Novee reassesses the affected asset to check whether the original vulnerability is actually gone.

That handoff is the product. Most AppSec workflows degrade between discovery and fix. The scanner or pentester understands the exploit. The issue tracker gets a shortened version. The engineer receives something like “authorization bypass on account endpoint” and spends half the remediation time reconstructing what the report meant. By the time a patch lands, the original proof may be detached from the diff that claims to fix it. That is how teams end up closing tickets while leaving the root cause intact.

Ido Geffen, Novee’s CEO, framed the move plainly: “AI coding agents are already helping engineering teams write and refactor production code daily. Pointing those tools at the remediation queue is the obvious next step. What has been missing is validated security context and orchestration.” That is the right diagnosis. The problem is not that Claude or Codex lacks a vulnerability-fixing prompt. The problem is that the agent needs evidence precise enough to act on, and the organization needs a control loop precise enough to verify the result.

A patch is not proof

This matters because security bugs are not ordinary lint errors. A broken import has a narrow failure mode. A missing authorization check may involve routing, tenancy assumptions, middleware order, object identifiers, API clients, background jobs, and cached permissions. A model can produce a neat patch that looks reasonable in review while missing the systemic failure that made the exploit possible. In security, “the diff looks good” is not enough. The original exploit needs to fail, adjacent paths need to be considered, and regression coverage should exist wherever the bug class allows it.

That is why the reassessment step is the strongest part of Novee’s announcement. The agent can write the candidate fix, but the system that validated the exploit should independently test whether the fix holds. This is the same architecture principle showing up in Detectify’s new MCP server, which exposes scanning and validation to agentic workflows: probabilistic code generation is useful, but deterministic or evidence-based validation should remain outside the model’s self-assessment loop. Do not let the agent grade its own security homework. It will be charming, confident, and occasionally wrong in production.

For engineering teams already using Claude Code, Copilot, Cursor, or Codex, the practical governance question is not “should we let agents help with security fixes?” The answer will increasingly be yes, because the remediation queue is too large and developer attention is too scarce. The better question is what authority the agent gets. Can it open issues? Can it create branches? Can it read exploit details? Are those details sent to a third-party model provider? Can it touch authentication code without extra review? Does the PR include the retest result, or only the agent’s explanation?

Those are not bureaucratic questions. They are the difference between accelerated remediation and accelerated incident generation.

What teams should actually do with this

The safe rollout pattern is boring, which is usually a good sign. Start with findings where validation is deterministic and blast radius is contained: missing headers, straightforward input handling, low-to-medium severity path issues, dependency remediations with clear tests. Require human review for auth, tenant isolation, secrets, payment flows, file handling, SSRF, deserialization, and any code path where the fix could quietly change the trust boundary. Attach the exploit summary and the post-fix reassessment to the pull request so reviewers can evaluate evidence, not just prose.

Teams should also track agent remediation separately from human remediation. Measure time to issue, time to PR, time to retest, fix acceptance rate, reopen rate, reviewer intervention rate, and regressions introduced by agent-generated patches. If the numbers show that agents are great at small deterministic fixes and mediocre at authorization boundary work, that is not failure. That is the operating model telling you where automation belongs.

There is a data-handling angle that deserves more attention than launch posts usually give it. Validated exploit context can contain sensitive URLs, account identifiers, endpoint shapes, internal architecture, and sometimes example payloads. Pushing that into GitHub issues and coding-agent prompts may be appropriate, but it needs policy: private repos, least-privilege agent tokens, redaction rules, model-provider review, audit logs, and limits on where vulnerability details appear in notifications. The vulnerability report is itself sensitive infrastructure.

The broader market signal is clear. AppSec vendors are trying to become the orchestration and evidence layer for agent-written code. Novee is doing it through pentest-to-PR workflow. Detectify is doing it through an MCP-accessible scanner. Open-source tools like Snyk Agent Scan, which had roughly 2,475 GitHub stars during research, are emerging to inspect the agent and MCP surface itself for prompt injection, tool poisoning, toxic flows, and credential mistakes. The category is moving from “security dashboard humans ignore” toward “security tools agents can call and humans can audit.” That is progress, provided the audit part survives the demo.

My take: Agentic Fix is directionally right because it preserves exploit context and keeps validation in the loop. But the win is not “AI fixes pentest findings.” The win is a traceable chain from validated exploit to issue, from issue to diff, from diff to retest, and from retest to review. If a workflow cannot answer “what exact exploit did this PR prove fixed?” it is not mature remediation automation. It is just a faster way to produce security-flavored pull requests.

Sources: SiliconANGLE, Novee, Detectify, Snyk Agent Scan, Model Context Protocol docs

The ticket is where security context usually goes to die

A patch is not proof

What teams should actually do with this

Sign up for more like this.