agentic-coding

Copilot CLI’s preToolUse Fix Makes Hook Failures a Security Boundary

Anatoliy Kolodkin

31 May 2026 • 5 min read

Copilot CLI’s most important May 31 change is not the mouse support in diff view. It is one sentence buried in a pre-release changelog: preToolUse hook errors now deny the tool call instead of silently allowing execution.

That sounds like plumbing until you remember what a coding agent actually is. It is not a chatbot with a nicer prompt box. It is a runtime that can inspect a repository, call tools, launch MCP servers, run shell commands, install plugins, route models, and keep working after the human has moved on. In that world, a pre-tool hook is not decoration. It is where policy has a chance to stop text from becoming side effect.

GitHub’s Copilot CLI v1.0.57-4, published May 31, changes that failure mode in the right direction. Hook failures now fail closed. The same release fixes Ctrl+C and modified keys inside tmux, makes @-mention file search case-insensitive, honors repo-level extraKnownMarketplaces from .github/copilot/settings.json, stops installed plugins from carrying their source .git directory, corrects policy blocking for MCP servers configured with npx --registry, and prevents sessions from hanging indefinitely after internal event-processing errors.

None of that will trend on Hacker News. It is also exactly the kind of release note that decides whether agentic coding becomes infrastructure or just a very expensive way to scare your security team.

Fail-open was the wrong default for agent policy

The phrase “silently allowing execution” should make platform engineers uncomfortable. If a preToolUse hook exists, teams are going to use it for the work that matters: allowlists, approval routing, destructive-command detection, secret checks, sandbox selection, cost tagging, audit enrichment, and policy exceptions. If that hook throws, times out, returns malformed output, or crashes because an internal dependency changed, the safe behavior is to deny the call and ask for human attention.

Failing open turns the policy layer into an optimistic suggestion. It says the tool is allowed to proceed precisely when the system least understands what is happening. That is survivable for low-risk read operations. It is not survivable for shell commands, package installation, networked MCP servers, repo writes, migrations, or cloud-agent workflows that can spend both money and trust while nobody is looking.

This is the category error many agent demos make. They treat “the model proposed a command” and “the runtime executed a command” as adjacent UI states. They are not. Between those two events lives the security boundary. A good coding-agent runtime makes that boundary boring, explicit, and conservative. Copilot CLI’s change moves in that direction: when the guardrail breaks, the agent stops.

Repo-local marketplaces are policy, not personalization

The marketplace fix matters for the same reason. copilot plugin marketplace list now honors repo-level extraKnownMarketplaces settings from .github/copilot/settings.json. That lets a project define additional plugin sources in the repository itself, not just through user-global configuration.

Used well, that is a serious governance feature. A company can point Copilot toward an internal marketplace of reviewed plugins. A team can standardize project-specific capabilities. A regulated repo can make the approved supply chain visible to reviewers instead of depending on whatever a developer installed last month.

Used casually, it is another supply-chain surface with better branding. A repo-local setting that expands where the agent finds plugins deserves the same review posture as CI workflows, package-manager registries, editor extensions, and MCP server definitions. If a pull request changes .github/copilot/settings.json to add a marketplace, the review question is not “does the JSON parse?” It is “who controls that marketplace, what can those plugins instruct the agent to do, and how will we know if the source changes?”

The plugin packaging fix points at the same discipline. Installed plugins no longer include the source repository’s .git directory. That is good hygiene. A .git directory can contain remotes, history, refs, hooks, abandoned files, and metadata that has no business traveling with an installed agent plugin. Even when there is no secret in it, shipping VCS internals into a runtime extension is sloppy. Agent plugins should be artifacts, not someone’s working tree in a trench coat.

`npx --registry` is where convenience meets execution risk

The MCP fix is subtler. The release says MCP servers configured with npx --registry are no longer incorrectly blocked by policy. On the surface, that sounds like a false-positive fix: teams using a private registry, mirror, or controlled package source should not have their MCP server blocked just because the command includes a registry argument.

But the larger lesson is that MCP policy has to understand command shape. An MCP server launched with npx is executable code entering the agent’s tool surface. The registry matters. The package name matters. The version pin matters. The working directory, environment, credentials, and network access all matter. Blocking every --registry is too blunt; allowing arbitrary registry-backed execution is too loose. The useful middle is policy that can distinguish “approved internal registry with pinned package” from “download and run whatever this prompt happened to suggest.”

That is where serious agentic-coding setups are headed. The policy unit cannot just be “MCP allowed” or “MCP blocked.” It has to be “this MCP server, at this version, from this registry, with these environment variables, in this workspace, for this risk tier.” Anything less becomes theater once agents can assemble tools dynamically.

Copilot CLI’s surrounding changes make that direction more obvious. In v1.0.56, GitHub added durable context-window tier persistence in session events, accurate context-window size per pricing tier in the model picker, BYOK provider configuration for ACP sessions, MCP content plus structuredContent handling, GitHub MCP tool pruning when gh is on PATH, and code-review agent model inheritance from the current session. That is not one feature cluster. It is the outline of a runtime: policy, provider routing, context budgets, tool surfaces, session replay, and review defaults all becoming state the system can preserve and reason about.

The emergency brake has to work in tmux

The tmux fix is easy to underrate. Ctrl+C and modified keys now work correctly inside tmux, which matters because tmux is where many long-running developer sessions live. If a coding agent starts a bad command, hangs, or begins an expensive loop, Ctrl+C is not a nicety. It is the operator’s emergency brake.

The same release fixes sessions hanging indefinitely after an internal event-processing error, while v1.0.57-3 fixed resume after a crash that left partial data in the session log. These are not glamorous UI details. They are audit and recovery mechanics. If an agent run touches a real repository and then crashes, the team needs to know what happened, what tool calls completed, what policy state applied, and whether resuming will replay, skip, or corrupt the work. A transcript that looks complete but lost policy context is worse than no transcript, because it invites false confidence.

There is also now a direct cost angle. Copilot’s June 1 usage-based billing shift means a hung session, runaway retry loop, oversized tool surface, or failed-open hook can spend money as well as time. GitHub’s docs define one GitHub AI Credit as $0.01, and model costs vary by input, output, cached tokens, and model. Agentic surfaces such as Copilot CLI, cloud agents, code review, Spaces, Spark, and third-party agents are exactly where repeated calls and large context windows accumulate. Runtime correctness is FinOps now. The bill is just the telemetry arriving late.

For engineering teams, the practical checklist is short and non-negotiable. If you use preToolUse, test the ugly paths: thrown exception, timeout, malformed response, missing dependency, permission-store outage. Confirm every one denies the tool call. Review changes to .github/copilot/settings.json like infrastructure changes. Pin MCP server packages and registries. Prefer narrow local tools over broad dynamic servers. Verify Ctrl+C works where your developers actually run agents, including tmux and SSH. After a crash, inspect whether the resumed session preserved transcript, tool history, context tier, provider route, and policy state.

The editorial take is simple: the model is no longer the most interesting security boundary in coding agents. The runtime is. Copilot CLI v1.0.57-4 is a small pre-release, but it fixes a big rule: if the policy code breaks, the agent should not get the benefit of the doubt. LGTM.

Sources: GitHub Copilot CLI v1.0.57-4 release, GitHub Copilot CLI v1.0.57-3 release, GitHub Copilot CLI v1.0.56 release, GitHub Copilot models and pricing docs, GitHub Copilot usage-based billing announcement

Fail-open was the wrong default for agent policy

Repo-local marketplaces are policy, not personalization

npx --registry is where convenience meets execution risk

The emergency brake has to work in tmux

Sign up for more like this.

`npx --registry` is where convenience meets execution risk