codex

VS Code’s Copilot Agent Window Turns the IDE Into an Agent Operations Console

Anatoliy Kolodkin

03 Jun 2026 • 6 min read

GitHub’s latest Copilot-in-VS-Code roundup is easy to misread as another bag of editor features. It is not. The interesting thing is that VS Code is being reshaped around agent work that is long-running, remote, visual, stateful, metered, and occasionally dangerous. That is a very different product from autocomplete with a chat sidebar.

The June 3 changelog covers VS Code releases v1.120 through v1.123, with VS Code 1.123 itself dated June 3, 2026. The feature list is broad: an Agents window in Stable as a preview, remote sessions over SSH and Dev Tunnels, session sync to a GitHub account, Chronicle queries over past work, BYOK improvements for air-gapped environments, terminal output compression, secret-handling protections, experimental command-risk explanations, sandbox retries, integrated browser screenshots, and even a two-hour delay before newly published extensions auto-update. That sounds scattered until you group it by job: GitHub is turning the IDE into an operations console for coding agents.

That phrase is intentionally less cute than “AI pair programmer.” Pair programming implies a tight loop: human thinks, assistant suggests, human accepts. Agent operations is different. It involves multiple sessions, background work, remote machines, branches, terminals, browsers, screenshots, audit trails, token budgets, and policy decisions about what the agent may do while the developer is not staring at the prompt. VS Code is starting to admit that reality in the UI.

The chat panel was too small for the job

The new Agents window is available in VS Code Stable as a preview and supports multiple sessions side by side, pinned session views, drag/drop or Alt-click opening, and separate active-session state for Terminal, Files, and Changes views. That is not polish. It is a product correction. Once an agent can create branches, touch files, run tasks, and keep working remotely after a client disconnects, a sidebar transcript is not enough surface area.

Remote agents are especially important. GitHub says sessions can run on remote machines over SSH or Dev Tunnels, and continue after the client disconnects. Microsoft is also investing in Agent Host Protocol for synchronizing agent session state across clients. That is a serious architectural bet: the agent is no longer just something happening inside your local editor process. It is work with location, lifecycle, continuity, and handoff.

For developers, this is useful because real engineering work does not fit neatly inside one local prompt. Long test runs, migration attempts, multi-repo changes, dependency upgrades, UI verification, and CI debugging all benefit from persistent state. But persistence changes the risk model. A disconnected session that continues running is productivity when scoped well and a governance problem when scoped poorly. Teams should know where sessions run, which credentials they inherit, how logs are retained, how remote terminals are cleaned up, and who owns the result if the human disappears for lunch.

Session memory is useful until it becomes sensitive infrastructure

Session sync and Chronicle may become some of the most loved features in the release. Sync can automatically store chat sessions to a GitHub account, including conversation, touched files, repository context, branch, timestamps, and referenced PRs, issues, and commits. Chronicle commands can query past sessions, generate standup reports, search coding history by topic, file, or PR, and produce personalized productivity tips.

There is a lot to like here. Engineering memory is usually scattered across Slack threads, local terminals, abandoned branches, half-written notes, and “I think we tried that last week.” A searchable history of agent work could answer practical questions: why did we reject this approach, which files did the agent modify during the failed refactor, what PR came out of that debugging session, and what should I mention at standup? That is valuable, especially as agent-assisted work creates more intermediate reasoning and attempted changes than traditional commits capture.

It is also sensitive operational data. “Touched files, repository context, branch, timestamps, referenced issues” is not just productivity metadata. It can reveal roadmap work, incident response, customer names embedded in branches, security-sensitive files, vulnerability triage, and internal architecture. The right question is not whether session sync is good. It is which repos are allowed to sync it, what retention policy applies, who can search it, and whether the destination account/backend is approved for source-adjacent material. The productivity feature and the audit log are the same object wearing different hats.

BYOK is becoming table stakes, not a checkbox

The BYOK changes are the most enterprise-shaped part of the update. GitHub says VS Code now supports air-gapped environments without GitHub authentication, custom endpoint providers compatible with chat-completions, responses, and messages flows, provider-based model picking, real token usage visibility for BYOK models, reasoning-effort controls, and configurable utility models for titles, summaries, rename suggestions, commit messages, and intent detection.

This is where the coding-agent market is quietly moving. Teams are no longer only comparing answer quality. They are asking which models can run inside their procurement path, their network boundary, their cost model, and their identity story. BYOK without real token visibility is barely BYOK; it is just outsourced surprise billing. Reasoning-effort controls matter because teams need intentional tradeoffs: use more reasoning for architecture work, less for commit messages, route cheap utility tasks to cheap models, and do not burn premium context on a branch-title suggestion.

Practitioners should treat these controls like infrastructure defaults, not individual preference knobs. Define model tiers by task. Decide which endpoints are approved for which repo classes. Set reasoning-effort defaults for review, refactor, test generation, and summarization. Capture token usage somewhere finance and platform engineering can reconcile. Otherwise every developer becomes their own agent FinOps department, which is as fun as it sounds.

Terminal safety is improving, but labels are not guardrails

The terminal changes show GitHub is paying attention to the parts of agent work that actually hurt. Expanded output compression for tests, builds, linters, Docker, and package managers can reduce token waste and keep the model from drowning in logs. Secret prompts for passwords, passphrases, PINs, and verification codes stay in the terminal instead of being sent to the LLM. Background-command cleanup is a welcome bit of runtime hygiene. The new VSCODE_AGENT environment variable gives CLIs a way to adapt when invoked by an agent.

The experimental AI-generated command-risk explanations are useful with an asterisk. They can teach developers why a command deserves attention: writes to the repo, network access, deletion, permission changes, package install, secret-adjacent behavior. But an generated risk label is not a security boundary. The policy still needs to be deterministic: allowlists, deny rules, human confirmations, sandbox modes, and logs. If a generated label says “low risk” and the command mutates deployment config, the label is wrong; the system still needs to stop it.

The sandbox retry behavior deserves the same careful reading. Retrying network-dependent terminal commands with broader network permissions while preserving filesystem protections is pragmatic. Falling back to unsandboxed execution if that still fails may be acceptable for a solo developer’s toy project. It is not something an enterprise should enable casually across sensitive repos. Privilege widening should be visible, logged, and ideally require explicit human confirmation. Convenience is not the enemy; silent escalation is.

The integrated browser improvements are practical: device emulation, viewport, area, and full-page screenshot capture into chat context, favorites, and local HTML preview without an extension. This is good product work because many agent failures are context failures. A screenshot of a broken responsive layout beats a vague prompt about “the header looking wrong.” The caution is familiar: screenshots can contain customer data, tokens, internal hostnames, or private dashboards. Visual context is still context.

There is one more small detail worth noticing: VS Code 1.123 adds a two-hour delay before auto-updating newly published extensions, except trusted publishers such as Microsoft, GitHub, and OpenAI. That is not specifically an agent feature, but it belongs in the same security story. Editors are now agent hosts. Extension supply chain risk and agent authority are converging. If the editor can host long-running agents with terminal, browser, and repo access, extension update policy is part of the runtime threat model.

The practical rollout is not complicated. Enable the Agents window preview for developers already doing serious Copilot or remote-agent work, not the entire org by default. Decide which repositories may sync sessions to GitHub accounts. Require BYOK users to expose token usage and follow model-tier defaults. Treat command-risk explanations as advisory. Audit sandbox network retry and unsandbox fallback behavior before using it on sensitive code. Use screenshots for UI work, but keep private data out of visual prompts.

This is a strong release because it stops pretending the future of coding agents is a nicer chat box. The IDE is becoming the place where agent work is launched, watched, resumed, audited, budgeted, and reviewed. Looks good to me — with the usual caveat that every convenience feature here is also a governance decision waiting for a default.

Sources: GitHub Changelog — GitHub Copilot in Visual Studio Code, May releases, Visual Studio Code 1.123 release notes, GitHub Copilot SDK GA changelog, GitHub Copilot CLI changelog

The chat panel was too small for the job

Session memory is useful until it becomes sensitive infrastructure

BYOK is becoming table stakes, not a checkbox

Terminal safety is improving, but labels are not guardrails

Sign up for more like this.