Gemini CLI v0.43.0 Pays Down the Runtime Debt Agentic Coding Actually Depends On

Google’s latest Gemini CLI release is not the kind of update that gets clipped into a keynote. Good. The agentic coding market has had enough theater. What it needs now is runtime debt payment: safer shell behavior, smaller edit blast radius, resumable sessions, cleaner MCP trust boundaries, and non-interactive modes that do not fall apart when an agent stops halfway through a task.

That is the useful read on Gemini CLI v0.43.0, published May 22. The changelog is long enough to make your eyes defend themselves, but the pattern is coherent. Google is hardening the terminal layer underneath its agent platform, and that matters more than another polished demo of an AI building a todo app under suspiciously ideal lighting.

The release lands with the repository at roughly 104,000 GitHub stars, 13,000 forks, and more than 1,500 open issues according to the research snapshot. That scale matters because Gemini CLI is no longer a toy repo for enthusiasts. It is a shared execution surface for developers trying to run Google’s agent stack in real terminals, in real repositories, against real credentials and real mistakes.

The edit tool is part of the safety model

The most immediately practical change is small: Gemini 3-family prompts now steer the model toward replace for surgical edits instead of reaching for full-file writes. PR #26480 describes the intent plainly: encourage targeted edits, reduce token usage, make reviews easier, and avoid accidental deletions.

This is exactly the kind of product work that does not sound impressive until you have reviewed a model-generated diff where 300 lines changed because the assistant wanted to rename one variable. Full-file rewrites are expensive twice: first in tokens, then in human attention. They inflate diffs, hide the actual semantic change, and create accidental regression risk when the model reconstructs code it did not need to touch.

For engineering teams, the lesson is bigger than Gemini. Tool descriptions are not documentation garnish. They are behavioral controls. If an agent has a precise edit primitive but the model keeps choosing the sledgehammer, the product has not shipped safe editing. Teams evaluating coding agents should test this explicitly: ask for a one-line change, inspect whether the tool call is surgical, then repeat after every runtime upgrade. If the agent cannot preserve diff hygiene, it is not ready to touch production branches unsupervised.

Shell safety needs evals, not vibes

The strongest signal in v0.43.0 is PR #26528, which adds shell-command safety evals. The research brief calls out concrete cases: prefer write_file over shell redirection for file creation, do not silently execute destructive commands such as rm -rf, and still use shell commands appropriately for legitimate listing tasks.

That is the correct level of specificity. “Safe agent” is too vague to be useful. “Does not run rm -rf without an explicit safety path” is a testable claim. “Uses a file tool instead of shell redirection when creating files” is also testable. The industry keeps trying to turn agent safety into a model personality trait when it is actually an integration contract: prompts, tools, permissions, sandboxing, confirmation UX, and regression tests all have to line up.

Practitioners should copy this pattern. Build a tiny shell-policy harness for your repo. Include destructive commands, credential-looking paths, package-manager installs, file writes, directory listings, and common “just run this” traps. Then run it against every coding-agent upgrade. The question is not whether Gemini, Claude, Codex, Copilot, or Qwen is “safe” in the abstract. The question is whether the runtime obeys your team’s shell policy on Tuesday morning when a developer is tired and the agent sounds confident.

Session state is becoming operational state

Gemini CLI v0.43.0 also fixes non-interactive JSON mode so AgentExecutionStopped returns a valid JSON payload containing the session ID, partial response, and statistics. That is not glamorous. It is also one of the differences between an agent that can be composed into scripts and one that only works when a human babysits a terminal.

The release adds /export-session <path> and --session-file <path> via PR #26514, with validation around ambiguous combinations such as --session-file, --resume, and --session-id. That validation is the quiet part worth noticing. Resumable agent work creates identity problems: which session is canonical, what happens when two identifiers conflict, and how much context gets replayed when a task moves between machines or processes?

If your team is experimenting with background coding agents, treat session files like build artifacts with privacy implications. They can contain task context, file paths, tool output, maybe sensitive snippets depending on logging behavior. Store them deliberately. Rotate them. Decide whether they belong in a local temp directory, an encrypted workspace, or nowhere persistent at all. The convenience is real, but so is the audit surface.

MCP trust boundaries should be visible

The MCP change is another good example of boring done right. In untrusted folders, gemini mcp list now shows configured project-scoped MCP servers as disabled with a warning instead of hiding them or implying that user-scoped servers are connected. That UX choice matters because trust boundaries that disappear from the interface become folklore.

MCP is powerful precisely because it lets agents reach out to tools, resources, prompts, and external systems. That also makes it one of the sharpest edges in coding-agent adoption. If a project declares MCP servers but the folder is not trusted, the user needs to see that configuration and understand why it is disabled. Hiding the server list is cleaner visually and worse operationally. It teaches the wrong mental model.

This is where Google’s release is aligned with what serious teams need: explicit disabled states, warnings at the point of use, and fewer silent splits between what the repo config says and what the runtime actually connected. A developer should never have to infer MCP trust behavior from a missing row.

Memory should not be able to scribble everywhere

PR #26535 tightens private Auto Memory patch allowlists so private memory changes are restricted to the project memory document set: MEMORY.md and direct sibling Markdown files in the project memory directory. The brief notes that nested paths, locks, .inbox/, skills, state files, and non-Markdown files are rejected.

That is the right default. Agent memory is useful because it turns repeated context into durable context. It is dangerous for the same reason. A memory writer that can patch arbitrary project files is one prompt-injection away from becoming a quiet configuration editor. Constraining memory writes to obvious Markdown memory documents makes the behavior auditable and easier to explain to developers who did not sign up for a background agent modifying their repo’s operational files.

Teams should still review memory changes like code. Put the memory directory in version control if appropriate. Diff it. Watch for policy drift, accidental secrets, and model-invented “facts” that become persistent instructions. The agent remembering something is not evidence that the thing is true.

The rest of v0.43.0 keeps reinforcing the same theme: randomized sandbox container names to avoid concurrent CLI races, OAuth fixes for headless Linux, errors for dropped tool responses instead of silent corruption, parallel tool-call streaming ID collision fixes, Streamable HTTP MCP handling, ACP rendering fixes, context snapshot improvements, and AgentSession plumbing for local and remote subagent protocols. Individually, these are patch notes. Together, they are the runtime growing calluses.

The practical move for engineers is simple: do not upgrade this blindly on the whole team because the changelog says “safety.” Upgrade it in a disposable repo, then run a smoke test: one surgical edit, one destructive-shell trap, one MCP list from an untrusted folder, one memory update, one non-interactive stopped execution, one exported/imported session, and one concurrent sandbox launch. If those pass, v0.43.0 is the kind of release you actually want: less sparkle, fewer footguns.

My take: Google’s agent story will not be won by the next model slide. It will be won in exactly this layer — edit tools that keep diffs small, evals that catch dangerous shell behavior, memory allowlists that reduce blast radius, trust-aware MCP UX, and session primitives that survive the terminal being an unreliable place to keep important work. That is not flashy. That is why it matters.

Sources: GitHub — Gemini CLI v0.43.0, PR #26480, PR #26528, PR #26514, PR #26535