Codex Hooks Going GA Makes Agent Policy Programmable — and Also Reviewable

Codex Hooks Going GA Makes Agent Policy Programmable — and Also Reviewable

Hooks are the kind of feature that rarely wins a launch-day popularity contest and then quietly becomes the thing enterprises cannot deploy without. OpenAI’s Codex Hooks going generally available is not glamorous. It is also one of the more important Codex platform moves this week, because it gives teams a programmable place to put policy, logging, validation, and local context around an agent that can read code, call tools, and ask for permission to mutate a repo.

The short version: Codex can now run lifecycle hooks at key moments in a session. The longer, more useful version: agent configuration is becoming part of the software supply chain. If a repo can carry instructions, plugins, MCP configs, and hook definitions that influence what the agent sees and does, those files deserve the same suspicion and review discipline teams already apply to CI scripts. “It is just agent config” is the new “it is just YAML.” Famous last words.

OpenAI describes hooks as a way to send conversations to logging or analytics, scan prompts for API keys, summarize conversations into persistent memories, run validation checks when turns stop, and customize prompting in specific directories. Discovery spans user and project surfaces: ~/.codex/hooks.json, ~/.codex/config.toml, project .codex/hooks.json, project .codex/config.toml, and inline [hooks] tables. Matching hooks from multiple files all run; higher-precedence config layers do not simply replace lower-precedence hooks.

That last detail matters. This is not one global callback you can reason about in isolation. Multiple command hooks matching the same event can launch concurrently, and one hook cannot assume it runs before another or prevents another from starting. If you are building policy on top of hooks, design for composition and conflict. If you need hard ordering, hooks alone are probably not your enforcement layer.

Programmable policy is not the same as a sandbox

The current event list gives a good map of where OpenAI thinks teams need control: SessionStart, UserPromptSubmit, PreToolUse, PermissionRequest, PostToolUse, and Stop. Command hooks receive JSON on stdin with fields such as session id, current working directory, hook event name, model, permission mode, tool name, tool input, and tool response depending on the event.

PermissionRequest is especially interesting because hooks can allow, deny, or decline to decide when Codex is about to ask for approval, and if multiple hooks return decisions, any deny wins. That is the shape of a real policy surface. A team could deny networked package installs in certain repos, auto-allow known-safe read commands, require manual approval for deployment scripts, or block tool calls that match known secret paths. Done well, this reduces approval fatigue without turning the agent loose.

But Hooks are not containment. OpenAI says this plainly in the PreToolUse documentation: it is a guardrail, not a complete enforcement boundary. It can intercept supported Bash invocations, apply-patch style edits, and MCP tool calls, but does not intercept every shell path, WebSearch, or non-shell/non-MCP tool. PostToolUse can review output and replace a result with feedback, but it cannot undo side effects that already happened. That distinction is the whole story.

Teams should therefore use hooks for instrumentation and workflow policy, not as a substitute for OS-level sandboxing, restricted credentials, network controls, and repo-level permissions. A hook that says “do not touch production” is nice. A runtime that cannot reach production without a separate credential boundary is better. The right stack is layered: sandbox first, permissions second, hooks third, model instructions somewhere below “please be careful” but above “vibes.”

Repo-local hooks should be reviewed like executable code

The trust model is the responsible part of the design. Hooks are enabled by default, but users can disable them with [features] hooks = false, and admins can force hooks off or on through managed requirements. Project-local hooks only load when the .codex/ layer is trusted. Non-managed command hooks require review. Managed hooks can be enforced by admins. Plugin-bundled hooks are off by default; enabling [features] plugin_hooks = true lets plugins load lifecycle hooks, but those hooks remain non-managed and require trust review.

This is exactly the right kind of friction. A project-local hook can run commands. A plugin-bundled hook can bring behavior a developer may not notice if they only skim the README. A managed enterprise hook can alter what gets logged, approved, or denied across an organization. These are not passive docs. They are executable policy and context mutation points.

The supply-chain implication is obvious. We already have AGENTS.md, CLAUDE.md, GEMINI.md, MCP server configs, skills, plugins, browser extensions, local memories, and now Codex Hooks. Every one of those can shape the agent’s behavior. Some are text; some are code; some can reach tools. The boundary between “instructions” and “automation” is getting blurry, and attackers love blurry boundaries because reviewers do not know which mental model to apply.

Practitioners should respond with boring controls. Keep user-level hooks personal and low-risk: load notes, summarize sessions, run local validators. Treat project hooks like CI: code review them, pin dependencies, avoid fetching remote scripts at runtime, keep them short, and document ownership. Put managed hooks under the same change-control process as other enterprise policy. Log decisions from permission hooks. Test hooks against representative tool calls. Make sure a broken hook fails in the direction you actually intend.

There is also a product lesson here for teams building their own agent harnesses. Lifecycle boundaries are where governance belongs. Do not hide policy in prompts. Give the runtime explicit events, structured inputs, clear precedence, auditable decisions, and admin-controlled enforcement. Then be honest about what the mechanism cannot stop. The worst security feature is the one that sounds like a sandbox and behaves like a suggestion.

Hooks going GA is good news. It means Codex is growing from a clever coding interface into a programmable operations surface. But LGTM’s approval comes with the same note we would leave on a pull request adding a new CI job: useful, reviewable, and absolutely capable of breaking things. Ship it — after someone reads the hook.

Sources: OpenAI Developers, OpenAI, OpenAI Developers Changelog, OpenAI Codex release 0.130.0