agentic-coding

AI Security Agents Just Crossed the Line From Triage Assistant to Vulnerability Factory

Anatoliy Kolodkin

07 May 2026 • 5 min read

Firefox just gave the AI-security debate a more useful benchmark than another leaderboard screenshot: 423 security fixes in one month. Mozilla says 271 bugs in Firefox 150 were attributed to Claude Mythos Preview, including 180 high-severity issues, 80 moderate issues, and 11 low-severity issues. That is not a chatbot finding a missing bounds check after being handed a suspicious function. That is a production security organization turning an AI model into a vulnerability-discovery pipeline and then surviving the consequences.

The important word there is pipeline. TechCrunch’s report on Mozilla’s work with Anthropic’s Mythos model could easily be read as “AI finds bugs now,” which is true but too shallow to be useful. The real story is that Mozilla built the unglamorous machinery around the model: target selection, ephemeral virtual machines, reproducible test cases, deduplication, triage, severity handling, bug lifecycle management, and human patch review. In other words, the model was not trusted because it sounded confident. It earned attention by producing artifacts engineers could run, inspect, and fix.

The model got better, but the harness made it operational

Mozilla’s own writeup is blunt about how quickly the situation changed. “Just a few months ago,” the team wrote, AI-generated security reports to open-source projects were mostly known as “unwanted slop.” Maintainers were paying the tax: cheap plausible reports on one side, expensive human verification on the other. Then two things changed at once. The models became much more capable, and Mozilla got better at “steering them, scaling them, and stacking them to generate large amounts of signal and filter out the noise.”

That second half is the part engineering teams should underline. Mozilla began with Opus 4.6 experiments, supervised the process in a terminal, tuned prompts and harness logic, then parallelized jobs across multiple ephemeral VMs. Each VM hunted within specific target files and wrote findings back to a bucket. The harness could create and run reproducible test cases, which is the line between “interesting static analysis” and “bug a human should spend time on.”

The results were not cosmetic. Mozilla published representative reports including a WebAssembly GC JIT issue that could create a fake-object primitive, a 15-year-old <legend> parsing bug, a 20-year-old XSLT bug, an RLBox sandbox escape, multiple IPC and sandbox escape paths, and a rowspan=0 table-layout overflow involving more than 65,535 rows. These are not beginner mistakes. They are the kind of old, cross-subsystem bugs that survive because they require patient reasoning over weird interactions.

Anthropic’s Mythos disclosure raises the stakes further. The company says Mythos can identify and exploit zero-days across major operating systems and browsers, including a now-patched 27-year-old OpenBSD SACK bug. In one benchmark based on Firefox JavaScript engine vulnerabilities previously found by Opus 4.6, Opus produced working exploits only twice across several hundred attempts; Mythos produced working exploits 181 times and achieved register control in 29 more. That is not a marginal improvement. That is a phase change.

Security capability and coding capability are now the same surface

For developers using coding agents, the uncomfortable lesson is that “good at code” increasingly means “good at security research.” Anthropic says Mythos was not explicitly trained to become an exploit machine; the capability emerged from general improvements in code, reasoning, and autonomy. The same model behaviors that help an agent understand a large codebase, hypothesize a fix, run tests, and iterate also help it reason about attacker-controlled input, race conditions, serialization boundaries, and sandbox escapes.

That should change how teams think about agent permissions. A coding agent with shell access, repository access, secrets in the environment, CI mutation rights, deployment context, and production logs is not just a productivity tool. It is a powerful program-analysis actor operating inside your engineering perimeter. If it can reason about how to patch a vulnerability, it can also reason about how that vulnerability might be exploited. Your governance model should assume those are adjacent capabilities, not separate product categories.

Mozilla’s restraint is therefore the strongest practitioner signal in the story. Brian Grinstead told TechCrunch that for the bugs in the post, “every single one is one engineer writing a patch and one engineer reviewing it. We have not found it to be automatable.” That is exactly the right line. Let the agent generate hypotheses, reduce reproductions, create failing tests, and suggest patches. Do not let it silently land security-critical code because the demo looked clean.

Teams that want to copy the useful part of Mozilla’s approach should not start with “scan the whole repo for vulnerabilities.” That prompt produces noise. Start with narrow threat models and high-risk surfaces: parsers, auth middleware, plugin systems, sandbox boundaries, payment flows, deserializers, file upload paths, database permission layers, and anything accepting attacker-controlled input. Give the agent an executable harness. Run it in an isolated environment. Require reproducible artifacts. Feed findings into the same tracker, severity policy, and review process humans use. The model is a discovery engine, not a replacement for engineering accountability.

The bug queue is about to get bigger

The optimistic read is that defenders finally get a tool that can search the boring dark corners attackers already love. Firefox shipping 423 security fixes in April 2026 versus 31 a year earlier is the kind of delta that matters if it holds beyond one project. Mozilla also noted that previous hardening work paid off: the model repeatedly pursued prototype-pollution escape routes that Firefox’s architectural changes had already blocked. That is a useful reminder that AI does not make fundamentals obsolete. It rewards them. The better your boundaries, invariants, tests, and isolation are, the more often an agent runs into walls instead of finding paths through them.

The pessimistic read is that everyone’s triage load is about to spike. Open-source maintainers already know what low-quality AI reports feel like. Now imagine a world where model-assisted researchers can generate more plausible, reproducible, partly weaponized reports at scale. The limiting factor becomes not discovery but responsible handling: who deduplicates, who validates severity, who writes the patch, who reviews the patch, who coordinates disclosure, and who pays for the long days in between.

That is why the engineering answer is boring and correct: build the process before you celebrate the model. Security agents should run behind access boundaries. Findings should carry reproduction steps, confidence, affected versions, and suggested severity. Fixes should land through normal review. Agent output should be auditable. If a report cannot be reproduced, it should not consume the same attention as one with a minimized test case and sanitizer output.

The headline is that AI security agents crossed from triage assistant to vulnerability factory. The useful take is sharper: the factory only helps if your organization can inspect the parts. Mozilla’s work is impressive because it treats AI as an accelerator for disciplined security engineering, not as a magic auditor. If agents can find 15- and 20-year-old bugs in Firefox, they can probably find the embarrassing bug in your auth flow too. But only if you give them a harness, a sandbox, and a human who still owns the merge button.

Sources: TechCrunch, Mozilla Hacks, Anthropic Red Team

The model got better, but the harness made it operational

Security capability and coding capability are now the same surface

The bug queue is about to get bigger

Sign up for more like this.