codex

Codex 0.135 Alpha Is Turning Agent Memory, Search, and Goal Accounting Into Runtime Primitives

Anatoliy Kolodkin

28 May 2026 • 5 min read

Codex 0.135.0-alpha.2 is not the kind of release that gets a product-launch video. Good. The interesting part of coding agents right now is not whether they can write another demo todo app; it is whether their runtime has enough explicit state, accounting, and diagnostics that a serious engineering team can let them touch real work without operating on vibes.

OpenAI published rust-v0.135.0-alpha.2 for Codex on May 27 at 21:38 UTC, with npm showing @openai/[email protected] a little over half an hour later at 22:10 UTC. That cadence matters because 0.134.0 had only just moved to the stable line the previous day. Less than 30 hours later, the alpha branch was already carving out new runtime primitives: a dedicated memory database, config-gated memory tools, standalone web search, goal usage-limit accounting, session-level analytics, extension idle hooks, and better traces for MCP tool-listing stalls.

Read as a release note, it looks like plumbing. Read as a product direction, it is OpenAI admitting the obvious: coding agents are becoming long-running systems, and long-running systems need state boundaries, stop reasons, observability, and boring operational hygiene.

Memory is becoming a subsystem, not a prompt trick

The headline change is PR #24591, which moves generated memory rows and their stage-one/stage-two job state out of state_5.sqlite and into a dedicated memories_1.sqlite runtime database. Thread metadata stays in the state database, while memory-owned pipeline data gets its own storage boundary.

That sounds like housekeeping until you have to debug or govern agent memory. Memory is not just “extra context.” It has its own lifecycle: generation, citation, validation, deletion, reset, rebuilding, and accidental leakage. Splitting memory state from canonical thread state gives operators a cleaner answer to questions they should already be asking: where does memory live, what can rebuild it, what can delete it, and does clearing memory damage the thread record?

Codex is also adding a conservative gate around dedicated memory tools. PR #24600 introduces [memories].dedicated_tools, defaulting to false, so native memory operations such as list, read, search, and ad hoc note creation are not exposed merely because memory prompts are enabled. That default is the right call. A memory system that quietly influences prompts is already sensitive; a memory system that exposes callable tools is an authority surface. Teams should treat that flag like they treat shell access or MCP registration: enable it only in profiles where the operator understands what the tool can reveal and who can call it.

Web search moves into the agent runtime

Standalone web search is the second important seam. PR #23823 adds an extension-backed web.run tool behind standalone_web_search. The implementation builds search context from persisted history using a small tail heuristic: the previous user message, assistant text between the last two user turns capped around 1,000 tokens, and the current user message.

That context strategy is practical, but it is also a policy decision disguised as an engineering detail. If an agent can construct web-search queries from conversational history, teams need to decide which histories may leave the local boundary, which projects are allowed to search at all, and whether search invocations are logged with enough detail to audit later. A 1k-token cap is a helpful constraint. It is not a substitute for governance.

This is where Codex and GitHub Copilot are converging from different directions. Copilot’s recent memory controls are about repository/user-scoped facts across GitHub-native surfaces. Codex’s changes are more runtime-shaped: local databases, explicit config gates, extension-backed tools, analytics metadata, and trace points. The comparison is no longer “which assistant is smarter?” The better question is: which runtime lets a platform team inspect, scope, delete, meter, and audit the agent’s state without pretending hidden context is magic?

Usage limits need real terminal states

PR #24628 is easy to underestimate because “goal usage limits” sounds like billing furniture. It is not. When a workspace usage limit stops a turn, Codex now accounts current goal progress, marks the active or budget-limited goal as UsageLimited, clears active goal accounting, and prevents later token/tool events from charging usage to the stopped goal.

That distinction is essential for any agent that can pursue work across many steps. “Done,” “blocked,” “failed,” and “stopped because the workspace hit a usage limit” are different operational states. Collapsing them into one vague failure bucket creates support tickets, bad dashboards, and misleading automation. If your agent budget expires halfway through a migration, the correct next action is not the same as a test failure or a missing permission.

For teams experimenting with Codex goals, this is the kind of edge case worth testing deliberately. Set a constrained workspace budget. Run a multi-step goal. Confirm the UI, logs, and analytics say the work stopped because of usage limits rather than because the model gave up. Agent cost controls only matter if the runtime can explain what the budget interrupted.

MCP diagnostics are becoming table stakes

The release also adds traces around one of the least glamorous but most painful failure modes: stalled MCP tool listing. PR #24667 instruments the pre-stream tool-router path, the MCP manager read lock, and per-server MCP startup snapshots. Translation: when Codex appears to be stuck in “Thinking,” feedback logs should make it easier to tell whether the choke point is MCP startup, tool listing, lock contention, backend latency, or something else entirely.

That matters because MCP is becoming the integration layer for coding agents. Integration layers fail in boring ways: slow startup, broken schemas, auth prompts, server crashes, environment mismatches. A user sees an agent doing nothing. An operator needs to know which server blocked, when it started, whether tools were listed, and whether a lock was held too long. If MCP is the USB-C port for agents, MCP observability cannot remain a debug afterthought.

The same pattern shows up in smaller changes. PR #24655 adds session_id to runtime analytics events so root threads and subagent threads can be grouped. PR #24368 adds request-kind metadata for foreground turns, startup prewarm, compaction, and detached memory model requests, plus a window_id context-window identifier. PR #24744 adds an on_thread_idle lifecycle hook for extensions. PR #24637 makes standalone updates run noninteractively with CODEX_NON_INTERACTIVE=1. None of this wins a benchmark. All of it helps when the agent runs for hours, spawns subagents, compacts context, calls tools, searches the web, remembers prior work, and stops only when a goal or budget says so.

The compare range from rust-v0.134.0 to rust-v0.135.0-alpha.2 backs up the scale of the plumbing: 86 commits, 300 files changed, 9,456 additions, 1,932 deletions, and 11,388 total changed lines. New or heavily changed files include a 921-line doctor thread inventory module, a 581-line additional-context test suite, and transport/client tracking changes in the app-server layer. This is not a cosmetic alpha tag.

The community reaction is basically silence. HN exact searches for the release and the notable PR titles returned zero stories during the research window. That is normal. Runtime seams rarely trend until they fail. But the repository itself is not quiet: the research snapshot showed more than 86,000 stars, more than 12,000 forks, and more than 5,000 open issues. At that scale, “boring” runtime controls become the difference between adoption and incident response.

The practical guidance is simple: do not spray this alpha across every developer laptop because the version number moved. Do stage it if you are evaluating Codex as an agent runtime rather than a single-user CLI. Inspect whether memories_1.sqlite is created and reset the way you expect. Confirm dedicated memory tools stay off unless explicitly enabled. Review how standalone web search constructs query context and where that activity is logged. Force a usage-limit stop and verify goal accounting. Break an MCP server on purpose and see whether the new traces identify the stall.

Codex 0.135 alpha is a state-control release. OpenAI is turning memory, search, goals, MCP diagnostics, extension lifecycle, and analytics into explicit runtime surfaces. That is less exciting than a demo. It is also more important. The next phase of coding-agent competition will be won by the systems that make autonomy governable, not the ones that merely make it look fluent.

Sources: GitHub release — openai/codex 0.135.0-alpha.2, GitHub compare — rust-v0.134.0...rust-v0.135.0-alpha.2, npm package — @openai/codex, PR #24591, PR #23823, PR #24628, PR #24667

Memory is becoming a subsystem, not a prompt trick

Web search moves into the agent runtime

Usage limits need real terminal states

MCP diagnostics are becoming table stakes

Sign up for more like this.