Codex v0.133.0 Turns Agent Goals Into Enterprise Runtime State

Codex v0.133.0 Turns Agent Goals Into Enterprise Runtime State

Codex v0.133.0 is not a release about making the model look clever in a demo. It is about making the work unit around the model less hand-wavy. OpenAI is turning “keep going until this is done” from a prompt into a persisted, metered, policy-bound object, and that is the kind of boring runtime work that separates a coding assistant from an enterprise agent.

The release, published May 21 at 16:48 UTC, makes goals stable and enabled by default. That one line carries more weight than another benchmark claim would. A goal gives Codex a durable thing to work toward across turns: something that can be stored, updated, accounted for, blocked, resumed, inspected, and exposed to clients. Without that object, long-running agentic coding is basically a chat transcript with ambition.

OpenAI backed the change with a dedicated goals_1.sqlite database, separate migrations, startup and telemetry metadata, codex doctor visibility, and repair-path plumbing. That sounds like database housekeeping because it is database housekeeping. It is also exactly what teams need if they want agent work to become auditable infrastructure instead of terminal folklore.

Goals are the new job queue

Professional software work rarely fits into one model response. “Migrate this package,” “make this flaky test reliable,” “remove the deprecated API,” and “ship the auth refactor” are multi-turn tasks with tool calls, partial failures, interruptions, and review checkpoints. The old unit of work for chat-based coding tools was the turn. The operational unit teams actually need is closer to a job.

Codex is moving in that direction. PR #23696 persists active-goal progress across tool-finish, turn-stop, and turn-abort lifecycle hooks, tracking token deltas and wall-clock progress snapshots while emitting ThreadGoalUpdated events. That is a small API surface with a big implication: clients can stop guessing whether the agent is still making progress. They can observe it.

That matters for CI, background workers, team queues, and remote coding sessions. A goal that records token usage and elapsed time can feed budget controls. A goal that emits progress events can feed dashboards. A goal that survives turns can be resumed without pretending the transcript itself is the state machine. The model may still be probabilistic; the runtime around it does not have to be mush.

There is a migration caveat. The dedicated goal database does not backfill old experimental goal rows. Teams that tested earlier goal support should treat v0.133.0 as a boundary, not a seamless continuation. That is annoying, but defensible. Stable runtime features deserve clean storage contracts, and sometimes the right answer is to avoid dragging experimental schema ghosts into the permanent layer.

Remote control gets a human-readable contract

The other enterprise-shaped change is codex remote-control. It now starts remote control, waits for readiness, prints machine status, and stays alive until Ctrl-C. The example output is deliberately concrete: This machine is available for remote control as com-97826. Press Ctrl-C to stop. If startup fails, Codex reports the managed app-server binary path and version.

This is the kind of UX that looks minor until you have to support a fleet of developer machines. A background daemon that “probably registered” is not an operational surface. A foreground command that waits for readiness and names the machine gives the operator a contract: either the app-server is running, the control path is registered, and the machine is available, or the command fails loudly enough to debug.

That is especially important on Windows, where enterprise developer environments often live and where agent vendors too often discover platform reality after launch. The release includes stronger Windows sandbox integration, fixes for app-server socket reuse with the wrong current working directory, startup and shutdown races, realtime websocket compatibility, and remote-control-adjacent pain points. The community signal is small but telling: Reddit chatter around Codex remote control on Windows has been about app-server workarounds and remoteControl/enable flags, not about abstract agent theory.

Permission profiles become product surface

Permission profiles are the less glamorous but more durable story. v0.133.0 adds typed list APIs, cursor pagination, optional cwd, descriptions, managed requirements.toml, runtime refresh, and inheritance with cycle and undefined-parent rejection. That is policy infrastructure. If an admin or client cannot list the policy catalog, understand inheritance, and refresh permissions without ritual restarts, the system is not really governable.

Inheritance is particularly important because policy copy-paste is how enterprise controls drift. Teams want a base profile for “normal repo work,” a stricter one for production-adjacent code, maybe a permissive one for disposable sandboxes, and clear rejection when someone creates a cycle or references a profile that does not exist. Failing closed on malformed policy is not a nice-to-have. It is the minimum bar for an agent that can edit code and run tools.

Plugin discovery also grows up in this release, exposing marketplace-aware list output, installed versions, visible marketplace roots, and remote collection support. Extensions can observe subagent start and stop, tool execution, turn metadata, async approval, and turn processing. That observability layer will matter as Codex becomes less of a single terminal tool and more of a platform other tools attach to.

For practitioners, the evaluation checklist is straightforward. Create a goal, run a multi-turn task with real tool calls, abort it, resume it, and verify ThreadGoalUpdated events, token deltas, wall-clock accounting, and the goal database path in codex doctor. Start codex remote-control in foreground mode, then intentionally break app-server startup and check whether the error gives you the binary path and version. Define inherited permission profiles, create an inheritance cycle, and confirm Codex rejects it instead of trying to be helpful. Then repeat the test on Windows if that is where your developers work.

The bigger point: enterprise coding agents are not won by another “watch it build an app” clip. They are won in storage boundaries, readiness checks, permission inheritance, observable lifecycle events, and failure modes that tell the truth. Codex v0.133.0 is not flashy. Good. Flash is not what you want at the policy boundary.

Sources: OpenAI Codex v0.133.0 release, Codex developer docs, OpenAI Codex repository