codex

Codex Fixes macOS stderr Corruption Without Blinding Developers to Diagnostics

Anatoliy Kolodkin

26 May 2026 • 4 min read

A terminal UI bug in Codex sounds like small potatoes until you have watched diagnostic text paint over the prompt where a developer is trying to steer an agent. Then it stops being cosmetic. It becomes a trust problem: did the tool corrupt my input, lose my command, leak something into the wrong stream, or just make the terminal look haunted?

OpenAI’s May 25 fix for macOS stderr corruption is useful because it handles that boundary without taking the lazy escape hatch. PR #24459, merged at 2026-05-25T19:53:40Z, adds a terminal stderr guard so runtime diagnostics written directly to process stderr do not overwrite Codex’s inline composer while the TUI owns the viewport. The change landed with 2 commits, 4 changed files, 311 additions, and 5 deletions. The new codex-rs/tui/src/tui/terminal_stderr.rs accounts for 275 added lines and includes fd-level regression coverage.

The bug it fixes, issue #17139, had been open since 2026-04-08. A user running Codex v0.118.0 on Darwin 24.6.0 arm64 saw repeated macOS messages — MallocStackLogging: can't turn off malloc stack logging because it was not enabled — flood Terminal.app and eventually fill the terminal. The issue collected 8 comments before closing when the fix merged.

The terminal is part of the runtime

Agent products love to talk about reasoning, models, tool use, and autonomy. Fine. But developers still experience most coding agents through a terminal, IDE, browser tab, or chat pane. If the interaction surface is unreliable, the agent feels unreliable no matter how clever the underlying model is. A corrupted composer is not just ugly output. It is a broken control channel.

That is why the design choice in #24459 matters. The regression test is explicit: tui::terminal_stderr::tests::suppresses_stderr_only_while_terminal_is_owned verifies that stderr is suppressed while Codex owns the terminal and restored at handoff boundaries. Those boundaries include external interactive programs, suspend/resume, panic handling, and normal shutdown. In other words, Codex is not globally swallowing diagnostics. It is protecting the inline UI only during the period where raw bytes would corrupt the viewport.

That distinction is the whole story. Terminal ownership is contextual. When Codex controls the screen, direct stderr output can land in the middle of the user’s input and break the illusion — and sometimes the reality — of a stable interaction. When Codex hands control back to an external tool or exits, stderr should behave like stderr. A broad suppression fix would have solved the visible problem by creating a worse debugging problem. This fix draws a narrower boundary.

The validation detail is also instructive. The PR reports just argument-comment-lint-from-source -p codex-tui passing and just test -p codex-tui exercising the new stderr-guard regression test, while two unrelated guardian-policy tests reproduced existing failures. That is the kind of release note serious engineers should appreciate: it tells you what was tested, where the new coverage lives, and what noise was already known. Agent tools need more of this, not less.

Do not harden by blinding yourself

The follow-up PR #24479 is the better engineering lesson. It merged 33 minutes later, at 2026-05-25T20:26:11Z, with 1 commit, 2 additions, 10 deletions, and 2 changed files. Instead of stripping macOS malloc diagnostic environment variables, it preserves MallocStackLogging* and MallocLogFile* while retaining existing LD_ and DYLD_ hardening.

That reversal matters because “hardening” often becomes a blunt instrument. Before the stderr guard, removing malloc diagnostic environment variables may have looked like a practical way to stop allocator warnings from trashing the composer. After the guard, suppressing those variables becomes harmful. Developers who intentionally enable malloc diagnostics are usually debugging hard memory or runtime problems. Silencing those signals because a terminal UI cannot handle stderr is the wrong layer taking the hit.

This is a pattern teams should watch for in every coding-agent runtime. Security and reliability controls should reduce risk without destroying legitimate observability. Scrubbing dangerous dynamic-loader variables can be sensible. Removing developer-enabled allocator diagnostics because the UI gets messy is not. Redirecting output at the right ownership boundary is better than pretending the output should not exist.

The community reaction was appropriately quiet. HN had no meaningful exact-match discussion for the macOS stderr and malloc fixes during the research window. The real signal was the bug report itself: a concrete user-visible failure that persisted across versions, disrupted Terminal.app workflows, and required fd-level handling rather than another “please upgrade your terminal” shrug. Not every important runtime fix goes viral. Most of them just stop making operators mutter at their screens.

For practitioners, the immediate action is simple. If your macOS Codex sessions suffered from composer corruption or malloc warning floods, watch for the release containing #24459 and #24479, then retest in the exact environment where the bug appeared: Terminal.app or iTerm, tmux if you use it, your usual shell, your usual process-hardening variables, and any wrapper scripts around Codex. Test active TUI input, launching an external interactive program, suspend/resume, panic or abnormal exit if you have a safe way to simulate it, and normal shutdown.

If your team relies on macOS allocator diagnostics, confirm those environment variables survive Codex process hardening. If you wrap Codex in automation, avoid adding your own broad stderr swallowing unless you really understand the terminal ownership model. And if you are building your own agent CLI, copy the principle rather than the exact implementation: own the terminal explicitly, suppress hostile bytes only while you own it, restore normal streams at handoff, and add regression tests at the file-descriptor level. “The UI got weird” is often an underspecified runtime contract, not a user-support category.

The larger take is that agent runtime quality is not just model quality. It is stream handling, terminal ownership, subprocess hygiene, diagnostics, panic behavior, and respect for the developer’s debugging tools. Codex’s stderr fix is a small macOS patch with a useful philosophy behind it: protect the interactive surface, but do not solve UI corruption by making the system harder to inspect. That is the kind of boring correctness coding agents need if they want to be trusted with non-boring work.

Sources: OpenAI Codex PR #24459, PR #24479, issue #17139, commit 8a94430.

The terminal is part of the runtime

Do not harden by blinding yourself

Sign up for more like this.