codex

Copilot CLI’s June 1 Prerelease Makes Model Choice, Context Size, and Tool Policy a Terminal Problem

Anatoliy Kolodkin

01 Jun 2026 • 6 min read

Copilot CLI 1.0.57-5 landed on June 1 with a release note that says almost nothing and a timing signal that says plenty. This is the same day GitHub’s AI Credits model becomes harder for developers to ignore, and the CLI line is moving exactly where you would expect: model choice, context-window size, reasoning controls, hook policy, MCP behavior, and token overhead. The terminal is no longer just where Copilot answers questions. It is where cost, capability, and tool authority collide.

GitHub published v1.0.57-5 at 04:02 UTC, with npm showing @github/[email protected] at 03:59 UTC. The stable latest tag still points to 1.0.56, so this is prerelease territory. But the 1.0.57 line is worth reading because it shows the product surface GitHub is preparing for a world where agentic coding is metered, model-routed, and policy-constrained rather than bundled into the old “Copilot is just autocomplete” mental model.

The release assets are not trivial either. During research, the Linux x64 tarball was about 95.8 MB and had over a thousand downloads; the Darwin arm64 tarball was about 91.2 MB and had a similar download count. This is not a theoretical CLI used by three people in a lab. It is a distribution channel for the way developers will increasingly experience Copilot outside the editor.

Model choice is now a budget decision

The changelog around the 1.0.57 line says Free and Student users can select models other than Auto in the model picker, and the picker now shows accurate total context-window size per pricing tier. That sounds like UI polish. It is not. Once Copilot interactions consume credits, model selection becomes a financial and operational control.

GitHub’s usage-based billing docs define AI Credits as a token-metered system: input, output, and cached tokens are converted into credits at 1 AI credit = $0.01. Paid individual plans include monthly allowances — 1,500 credits for Pro, 7,000 for Pro+, and 20,000 for Max — while completions and next-edit suggestions remain outside AI-credit billing for paid plans. The painful part is not the existence of a meter. The painful part is that agent sessions can hide the meter behind context, tool schemas, retries, compaction, and model routing.

That is why context-window display matters. A developer deciding between a small cheap context and a large expensive one is not making an abstract preference call. They are choosing how much state the agent can see, how often it may need to compact, how much historical context gets preserved, and how large the billable prompt can become. Showing the actual total context window per pricing tier is the minimum viable honesty for this product category.

The same logic applies to durable context-tier persistence. The changelog notes that context-window tier selection persists in session events, and tier-derived limits are reapplied to request, compaction, and truncation logic even on SDK-only resume paths. That is the kind of fix only operators appreciate until it breaks. If a resumed session silently loses its context limit, the developer may get different truncation behavior, different quality, and different cost for what appears to be the same work. Reproducibility in agent systems includes budget state, not just prompt text.

The review agent inheriting the session model is a governance footgun and a feature

One sleeper change: code review uses the same model as the current session instead of a fixed default. This is more predictable from the user’s point of view, but it shifts responsibility back to teams. If a developer is in an expensive frontier-model session, review may inherit that cost profile. If they are in a cheap model because the task looked routine, review quality may inherit that too.

That does not make the change wrong. It makes it honest. A CLI-driven code review is part of the current working context, and using the current model can reduce “why did Copilot behave differently here?” confusion. But organizations should not treat this as a hidden implementation detail. They need model-routing guidance: cheap models for mechanical edits, stronger models for architecture and debugging, explicit review models for security-sensitive repos, and budget ceilings for long-running sessions. If teams do not write the policy, the model picker becomes the policy.

GitHub’s own pricing docs make the spread visible. Example listed per-million-token rates include GPT-5 mini at $0.25 input and $2 output, GPT-5.3-Codex at $1.75 input and $14 output, GPT-5.5 at $5 input and $30 output, Claude Opus 4.8 at $5 input and $25 output plus cache-write cost, and Gemini 3.1 Pro preview at $2 input and $12 output for prompts up to 200K tokens. The exact menu will keep changing. The structural point will not: “use the best model” is not a strategy when agents can run multi-step loops with tools attached.

Tool policy is becoming the real CLI interface

The 1.0.57 line also includes a set of fixes that look small until you squint at them as a runtime. In v1.0.57-4, preToolUse hook errors now deny tool calls instead of silently allowing them. That is the correct failure mode. If a policy hook fails, execution should stop. Anything else turns governance into stage dressing.

The MCP policy fix is also instructive: configured npx --registry servers are no longer incorrectly blocked. Good policy has to distinguish risky behavior from legitimate infrastructure. False positives do not make systems safer; they train developers to bypass the guardrail. If a company uses a private npm registry for MCP servers, the CLI should understand that configuration rather than treating the command shape as suspicious by default.

Copilot CLI also now surfaces both human-readable MCP content and machine-readable structuredContent payloads without duplicating JSON serialization. This matters more than the average release-note reader will think. Agents need text for reasoning and structured fields for reliable tool chaining. If the structured payload is missing, the next tool call may not get the ID, status, or selector it needs. If the text is missing, the human reviewing the run may lose the narrative. Runtime ergonomics is not just pretty output; it is preserving the right representation for the next actor in the chain.

The GitHub MCP token-saving default is maybe the most practical line in the batch. If gh is already on PATH, Copilot omits redundant gh-replaceable tools, reducing prompt/tool overhead. This is the grown-up version of agent cost control. Teams love to talk about model prices, but tool inventories are part of the prompt. Every redundant tool schema is context tax. Every overlapping capability asks the model to choose between equivalent doors. The cheapest token is still the one you never send.

That has direct rollout implications. Before buying a larger Copilot plan because agent sessions feel expensive, inventory the tools you expose. Remove duplicates. Prefer deterministic CLIs for deterministic operations. Keep MCP tools for capabilities that actually need tool abstraction, not because registering everything feels extensible. A smaller tool surface is cheaper, easier to reason about, and usually safer.

There are also quality-of-life fixes that point to real usage: clickable diff-line selection in diff mode, Ctrl-C and modified keys working inside tmux, case-insensitive file @-mention search, installed plugins no longer dragging along the source .git directory, and a configurable rubber-duck built-in agent. None of these are grand strategy. All of them suggest GitHub is smoothing the CLI for developers who keep it open long enough to hit terminal edge cases.

The practitioner playbook is straightforward. Test the prerelease on a non-critical machine. Run the same task under two models and two context tiers while recording AI Credits. Resume the session and confirm the tier persists. Trigger a failing preToolUse hook and verify the call is denied. Test MCP tools that return both text and structuredContent. Check whether the GitHub MCP tool list shrinks when gh is available. Then write the one-page policy your team will actually follow: which models for which tasks, which context tiers for which repos, which tools are allowed, and when a developer should stop an agent run instead of letting it spend its way through confusion.

Copilot CLI 1.0.57-5 is not important because the prerelease note is rich. It is important because the surrounding diff is honest about where the product category is going. Agentic coding in the terminal now has a bill, a context budget, model capability constraints, hook failure semantics, MCP governance, and tool-schema overhead. The teams that treat those as first-class engineering controls will do fine. The teams that treat them as settings will eventually discover finance, security, and developer experience all opened the same ticket.

Sources: GitHub Copilot CLI 1.0.57-5 release, GitHub Copilot CLI 1.0.57-4 release, GitHub compare range, npm @github/copilot, GitHub usage-based billing docs, GitHub Copilot models and pricing docs.

Model choice is now a budget decision

The review agent inheriting the session model is a governance footgun and a feature

Tool policy is becoming the real CLI interface

Sign up for more like this.