codex

Codex Auto-Review Turns Approval Fatigue Into a Policy Problem — Which Is Exactly Where It Belongs

Anatoliy Kolodkin

13 May 2026 • 5 min read

Codex Auto-review is one of the least flashy and most important pieces of agent infrastructure OpenAI has documented lately. The feature does not promise a smarter model, a bigger context window, or another benchmark trophy. It attacks a more practical failure mode: if coding agents are useful enough to run for real work, they will hit permission prompts constantly, and humans will eventually stop reading those prompts.

That is not a moral failure. It is a user-interface failure. “Allow this command?” is usually the wrong question. The meaningful question is closer to: does this action cross a trust boundary in a way that could leak private data, weaken security, mutate state, spend money, destroy work, or publish something externally? Codex Auto-review tries to move that judgment from repetitive modal friction into a policy-aware reviewer agent.

The key phrase in OpenAI’s docs is effectively “reviewer swap, not permission grant.” Auto-review does not expand writable roots, enable network access, weaken protected paths, or change the main agent’s sandbox. The main Codex agent still runs with the same filesystem and network boundaries. Auto-review changes who evaluates approval requests that already exist. That distinction is the difference between a governance feature and a vibes-based “let the agent approve itself” disaster.

The boundary has to exist before a reviewer can defend it

Auto-review applies when approvals are interactive: approval_policy = "on-request" or a granular approval policy that surfaces the relevant prompt category. If approval_policy = "never", there is nothing for the reviewer to review. Eligible requests include escalated shell or exec calls, network requests blocked by policy, file edits outside writable roots, MCP or app tool calls requiring approval, and Browser Use access to new websites or domains. Computer Use app approvals remain separate and still go directly to the user.

This is the first operational lesson: a reviewer agent cannot save a bad sandbox. If your default configuration already allows broad network access, wide filesystem writes, generic command prefixes, and every MCP server under the sun, Auto-review will never see many of the actions you care about. You need a real boundary before boundary review means anything.

OpenAI’s docs push teams toward the right fix: adjust the sandbox instead of training the reviewer to approve noisy escalations forever. Add narrow writable roots. Prefer precise command prefix rules such as ["cargo", "test"] or ["pnpm", "run", "lint"]. Avoid broad rules like ["python"] or ["curl"], because those are not safe operations; they are tool-shaped tunnels. A command prefix is a capability grant. Treat it like one.

The reviewer agent receives a compact transcript and the exact approval request: user messages, surfaced assistant updates, relevant tool calls and outputs, and the proposed action. Hidden assistant reasoning is explicitly not included. That is a sensible privacy and architecture line. The reviewer should judge the proposed boundary crossing based on observable task context and tool evidence, not on internal model scratchpad text that neither users nor auditors can reason about.

Denial semantics are the feature teams should copy

The most interesting part of Auto-review is not approval. It is denial. When the reviewer denies a request, Codex sends a strong instruction back to the main agent: do not pursue the same outcome through a workaround, indirect execution, or policy circumvention. Continue only with a materially safer alternative or stop and ask the user.

That sounds obvious until you have watched agents loop. A naive agent denied permission to read a token file may try a different shell command. Denied permission to reach a domain, it may try another route. Denied permission to modify a protected path, it may generate a script that modifies it later. The problem is not malice. The problem is goal pressure without enough policy memory. Denial has to become part of the task state, not just a failed tool call.

OpenAI also documents a rejection circuit breaker: the current implementation interrupts after 3 consecutive denials or 10 denials within the last 50 reviews in the same turn. Again, the exact numbers are less important than the shape. Long-running coding agents need loop brakes. A model that keeps asking for unsafe authority is not making progress. It is converting ambiguity into risk.

The override path is deliberately narrow. Denials are retained so the TUI can open an Auto-review Denials picker with /approve. Codex records up to 10 recent denials per thread, and any override applies only to the exact denied action for one retry. That is the right granularity. A human override should not become a new standing permission class because someone needed to unblock one action in one context.

Approval fatigue is a security bug

The reason Auto-review matters is not that developers hate clicks, although they do. It matters because repetitive low-context prompts train bad behavior. If every harmless test command, dependency read, or local file inspection interrupts work, users become approval vending machines. Once that happens, the approval system is theater: it technically exists, but nobody is making a meaningful decision.

The industry is converging on the same answer from different directions. GitHub Copilot CLI is auto-approving read-only gh commands like list, view, status, and diff. Codex is documenting Auto-review for sandbox-boundary requests. Browser agents are adding host-level allowlists. MCP and plugin ecosystems are forcing teams to think about tool authority. The pattern is clear: serious agents need permission taxonomies, not endless yes/no dialogs.

A useful taxonomy separates read, write, mutate, publish, spend, delete, authenticate, exfiltrate, and weaken-security operations. Reading a public docs page is not the same as reading browser history. Running npm test is not the same as running curl against an unfamiliar domain. Viewing a pull request is not the same as merging it. Uploading a file is not the same as generating it locally. Good agent UX makes those distinctions precise enough that approvals are rare, meaningful, and auditable.

The open-source guardian policy behind Codex calls out private data exfiltration, credential probing, broad security weakening, destructive actions, unsafe git operations, and untrusted external destinations as high-risk categories. That is the right neighborhood. Enterprises can customize policy with guardian_policy_config; individuals can define local [[auto_review].policy] entries in config.toml; managed requirements take precedence. The hierarchy matters because local convenience should not quietly override enterprise risk boundaries.

For teams adopting Codex, the rollout plan should be boring. First, make the sandbox tight: repo writes only, narrow scratch directories, no default outbound network access unless the workflow requires it, and explicit MCP/tool approvals. Second, classify which escalation types Auto-review may approve and which always require a person: credentials, browser profiles, production resources, broad git history rewrites, external uploads, tunnels, billing changes, and security-control changes should be conservative by default. Third, export the decision trail. If nobody can later explain why an action was approved, that was not governance. It was vibes with logs missing.

Also resist the temptation to make Auto-review compensate for bad task design. If a prompt is vague, the agent will request more authority as it explores. If the task names target files, validation commands, non-goals, and stopping conditions, the agent has less reason to wander across boundaries. The cheapest approval is the one you never need because the work was scoped correctly.

My take: Auto-review is less about convenience than survivability. Teams cannot manually approve every safe action, and they cannot blindly permit every risky one. The sustainable middle is a real sandbox boundary, a reviewer policy, denial semantics, circuit breakers, overrides scoped to exact actions, and audit trails. Coding agents are becoming runtimes with control planes. Codex Auto-review is one of the clearest signs OpenAI understands that.

Sources: OpenAI Developers — Auto-review, OpenAI Codex changelog, OpenAI Codex guardian policy, OpenAI Developers — Plugins, OpenAI Developers — Remote connections

The boundary has to exist before a reviewer can defend it

Denial semantics are the feature teams should copy

Approval fatigue is a security bug

Sign up for more like this.