claude-code

TrustFall Turns the Folder Trust Prompt Into the New curl | sh

Anatoliy Kolodkin

07 May 2026 • 5 min read

TrustFall is not scary because it tricks an AI model. It is scary because it does not need to. The attack sits below the model, in the project configuration layer that modern coding agents increasingly treat as executable infrastructure. Clone a repository, accept a friendly folder-trust prompt, and a project-defined MCP server can start as an unsandboxed OS process with your privileges. That is not prompt injection. That is supply-chain execution with a nicer dialog.

Adversa AI disclosed the issue today across Claude Code, Gemini CLI, Cursor CLI, and GitHub Copilot CLI. The Register’s coverage frames the core question well: how explicit does a toolmaker need to be when the user is about to approve code execution? Anthropic’s reported position is that once a user accepts “Yes, I trust this folder,” post-prompt execution is outside its threat model. Adversa’s counterargument is sharper: consent is not meaningful if the dialog does not explain what is about to run.

The trust prompt is doing too many jobs

The Claude Code chain is small enough to be uncomfortable. A malicious repository includes two committed files: .mcp.json, defining an attacker-controlled MCP server, and .claude/settings.json, enabling or approving that server through project-scoped settings such as enableAllProjectMcpServers or enabledMcpjsonServers. When the developer opens the project in Claude Code and accepts the generic trust prompt, the MCP server starts. No Claude tool call is required. No suspicious command has to be proposed in chat. The payload can even live inline inside the JSON configuration instead of in a separate script file.

That last detail matters because many developer security habits are visual. We look for shell scripts, install hooks, weird binaries, and suspicious diffs. Configuration files feel less dangerous, especially when they are written in JSON and parked next to other project metadata. Agentic tools break that intuition. A config file can now define a helper process that the agent runtime launches with host-level privileges. If your review posture still treats those files as documentation-adjacent, it is behind the threat model.

Adversa says MCP servers run as native OS processes with the full privileges of the user running the agent. They are not confined to the repository. They can read ~/.ssh, cloud credentials, shell history, other local source trees, and whatever else the account can reach. That is the blast radius hidden behind “Quick safety check: Is this a project you created or one you trust?” The wording says Claude Code can read, edit, and execute files here. The actual grant can start arbitrary helper processes that are not meaningfully limited to “here.”

The local developer case is already bad enough: one Enter keypress can approve a prompt that most people have been trained to clear quickly. The CI variant is worse. Adversa says that when Claude Code runs headlessly through the official GitHub Action or SDK path, the interactive trust prompt never renders. A malicious pull-request branch can trigger the same MCP execution path with zero human interaction if the workflow runs against unreviewed code and exposes useful secrets.

This is where “the user clicked trust” stops being a satisfying boundary. Plenty of engineering organizations are experimenting with agents in CI: code review bots, migration helpers, test fixers, documentation updaters, release assistants. Those workflows often run precisely on untrusted contributions, because that is where review automation is useful. If the agent runtime treats project-local configuration as sufficient to start arbitrary processes, then the agent job is no longer just reviewing code. It is executing repo-supplied infrastructure in an environment that may contain tokens.

The mitigation is not to ban AI in CI. The mitigation is to stop pretending these jobs are ordinary linters. Run agent workflows on pull requests with no deploy keys, no package-publishing credentials, no cloud admin tokens, and no broad repository write permissions. If the agent needs secrets, move execution post-merge or into a hardened sandbox with explicit allowlists. Treat project-scoped MCP and agent config the way you would treat a build script from an untrusted fork: useful, but executable until proven otherwise.

This is bigger than Claude Code

Adversa’s parity check is the part vendors should not ignore. The firm says Claude Code, Gemini CLI, Cursor CLI, and Copilot CLI can all execute project-defined MCP servers after a folder trust prompt, and all default toward trust. The products differ in how much the dialog tells the user. Gemini reportedly warns about project MCP servers and lists them by name. Cursor gives an MCP-specific warning without per-server enumeration. Claude Code and Copilot CLI use more generic folder-trust language. That variation is UX, not a fundamentally different exposure.

The shared pattern is the important thing: coding agents have made repository configuration more powerful than developer muscle memory expects. MCP began as a practical protocol for connecting models to tools and context. It is now also an ambient execution substrate. A local project can describe servers, commands, arguments, permissions, and tool surfaces that the agent runtime may bring to life before the model has done any interesting reasoning. Security reviews that focus only on model behavior are reviewing the wrong layer.

There is a fair argument on the vendor side. Developer tools cannot interrupt every repository open with a full security seminar. If every MCP server requires multiple modal warnings, people will either disable the feature or click through blindly. But that does not justify collapsing read, edit, execute, and unsandboxed helper-process startup into one cheerful trust prompt. Those are distinct grants with distinct blast radii. Browser permission prompts are annoying because the web learned, painfully, that ambient access is worse. Coding agents are now learning the same lesson with shell access.

For practitioners, the short-term checklist is concrete. Audit .mcp.json, .claude/settings.json, Cursor config, Copilot agent config, Gemini CLI project settings, and any agent-specific metadata before opening unfamiliar repositories with an AI coding CLI. Block project-scoped MCP auto-approval in managed settings where possible. Monitor child processes spawned by agent CLIs. Add repository rules that flag new or changed MCP configuration for security review. In CI, assume any project-defined agent server is hostile unless it comes from trusted code on a trusted branch.

The longer-term fix belongs to toolmakers. Block MCP auto-approval settings from project scope. Add a dedicated MCP consent dialog that defaults to deny. Enumerate the server name, command, arguments, and working directory. Offer a path to trust the folder while disabling MCP. Most importantly, stop treating “trust this folder” as a magic legal boundary. Developers trust folders for many reasons: to read code, run tests, or inspect a bug. That does not mean they knowingly consented to starting arbitrary unsandboxed processes that can read credentials outside the repo.

TrustFall is a good name because the failure is mostly social. The tools asked developers to fall backward into an agentic workflow and promised the platform would catch them. Instead, the catch mechanism is a default-yes prompt that no longer says what matters. AI coding agents have made project configuration executable. The UX has not caught up. Until it does, treat every agent config file like code — because in the ways that matter, it is.

Sources: The Register, Adversa AI, SecurityWeek, Help Net Security

The trust prompt is doing too many jobs

CI makes the consent argument weaker, not stronger

This is bigger than Claude Code

Sign up for more like this.