Microsoft Shows the CI/CD Agent Problem: Secrets, Tools, and Untrusted Prose in One Runner

Microsoft Shows the CI/CD Agent Problem: Secrets, Tools, and Untrusted Prose in One Runner

Microsoft’s Claude Code GitHub Action case study is the clearest current reminder that “AI in CI” is not a chatbot feature. It is an LLM-driven process with tools running inside your supply chain.

On June 5, Microsoft Threat Intelligence published research showing how Claude Code GitHub Action could expose CI/CD workflow secrets when agents processed untrusted GitHub content such as issue bodies, pull request descriptions, and comments. The affected secret named in the writeup was ANTHROPIC_API_KEY, but Microsoft noted the same path could expose other credentials available to the runner. Anthropic mitigated the specific issue in Claude Code 2.1.128 by blocking access to sensitive files under /proc/.

The disclosure timeline was fast enough to credit: Microsoft reported the issue to Anthropic through HackerOne on April 29, and Anthropic mitigated it on May 5. But the durable lesson is bigger than one fixed file path. The bug class is what happens when public collaboration text, model instructions, file-read tools, secrets, and outbound communication channels all meet in one automated workflow.

The sandbox was real. The boundary was incomplete.

The interesting detail in Microsoft’s writeup is the mismatch between tool paths. Claude Code GitHub Action used Bubblewrap sandboxing and environment scrubbing for Bash subprocesses in risky contexts. The environment variable CLAUDE_CODE_SUBPROCESS_ENV_SCRUB was auto-enabled for actions triggered by non-write users, which is a sensible mitigation. A shell command should not casually inherit secrets when the prompt may have been written by a stranger.

But the Read tool was a direct in-process capability. Microsoft showed that it could be steered to read /proc/self/environ, where the process environment contained ANTHROPIC_API_KEY. In other words: Bash saw the scrubbed world, while Read saw the real one.

That is the architectural mistake to remember. Users do not experience agent security by implementation path. They experience capabilities. If the policy is “this workflow must not expose secrets,” then every capability that can touch secrets has to enforce that policy. Shell sandboxing is not enough if a file-read tool can bypass it. Log redaction is not enough if the agent can transform the secret before writing it. A restricted GitHub token is not enough if an MCP tool or network path gives the agent a different exfiltration route.

Traditional CI already has plenty of sharp edges: YAML injection, overbroad tokens, secrets in forked PRs, unreviewed third-party actions, and shell commands composed from user input. Agentic CI adds a new interface: natural language that can influence tool use. That means an issue body is not just content anymore. It can be an attempt to operate the runner.

Prompt injection gets worse when the model can use tools

Microsoft’s exploit chain included two details that should make platform teams uncomfortable. First, the prompt framed the secret access as a “compliance review,” nudging the model toward reading sensitive state as if it were part of the task. Second, it instructed the model to remove the first seven characters of the secret so GitHub Secret Scanner would not match the obvious sk-ant- pattern.

That is crude. It is also enough to make the point. Secret scanners are useful, but they are not omniscient data-loss-prevention systems. If a model can reformat, split, encode, summarize, truncate, or otherwise launder a secret before it writes to logs or comments, pattern matching becomes a speed bump. Defenders should assume transformed data can leave through any channel the agent can write to.

The HTML-comment example is just as important. Microsoft observed prompt injection attempts hidden in HTML comments inside public GitHub issues. Maintainers viewing the rendered issue would not see them. The model consuming raw Markdown would. Any difference between human-visible context and model-visible context is an attack surface. If your review process says “the issue looked harmless,” but the model saw hidden instructions, the human review did not actually inspect the prompt.

This is not a Claude-only problem. It is the generic shape of tool-using agents in public developer workflows. GitHub issues, PR descriptions, comments, docs, dependency files, generated artifacts, and web pages can all contain instructions. The model’s job is to follow instructions. The system’s job is to decide which instructions are allowed to matter. If that distinction is left to vibes, attackers will supply the vibes.

Design AI workflows like hostile-input systems

The practical mitigation is not “never use AI in CI.” That is lazy advice. The useful advice is to stop treating these workflows like internal assistants when they are triggered by public input.

Start by splitting jobs by trust level. A workflow that reads public issues and suggests labels should not have the same permissions as one that opens pull requests or edits code. A workflow that can see secrets should not be triggerable by non-writers. A workflow triggered from forks should not get write tokens by default. If a job needs credentials, make the human approval boundary explicit and late.

Next, scope capabilities by data class. File-read tools should not be able to read runner internals, process environments, SSH keys, cloud credentials, package-manager tokens, or unrelated repository paths. Shell commands should inherit a scrubbed environment. MCP servers should be treated as both tool providers and possible exfiltration channels. WebFetch should not be casually enabled in a job that also has secrets. Logs, workflow summaries, issue comments, pull-request bodies, and generated artifacts should be considered write channels, not passive output.

Then make policy observable. Record which tools were enabled, which inputs were consumed, which files were read, which outbound requests happened, and which model-visible raw content differed from human-rendered content. Add review gates before anything that changes code, writes to GitHub, posts external data, or touches credentials. If the only audit trail is “Claude said it did the thing,” you do not have an audit trail.

Teams using Claude Code Action should update past the fixed versions, review whether they are on v1 moving tags or pinned releases, remove secrets from non-writer-triggered workflows, restrict allowedTools, scope GITHUB_TOKEN aggressively, disable public full-output logs for sensitive jobs, and search prior workflow logs and comments for transformed secret leakage. Settings like allowed_non_write_users: "*" deserve special scrutiny. Convenience settings are where supply-chain bugs like to rent space.

The broader industry lesson is that agentic CI/CD needs its own threat model. Public GitHub text is adversarial input. Tool calls are privileged operations. Secrets are not context. Logs are output channels. MCP servers are part of the trust boundary. And “the model probably won’t do that” is not a control.

Microsoft’s post matters because it turns a fuzzy warning into a concrete failure mode: prompt injection steers a tool-using CI agent toward runner secrets, transforms the secret to dodge scanning, and exfiltrates it through a channel the workflow allowed. That is not science fiction. That is Tuesday, if the YAML is permissive enough.

The take: agentic CI is useful, but only when designed like hostile automation with narrow capabilities. If your mental model is “Claude reviews my PR,” you will under-secure it. The accurate model is “an LLM-driven process with tools is running in my supply chain.” Build accordingly.

Sources: Microsoft Threat Intelligence, Anthropic Claude Code Action repository, Claude Code releases, GMO Flatt Security, Claude Code settings and permissions docs