azure-ai

Codex on Mobile Is Not About Coding on a Phone. It Is About Keeping Long-Running Agents Inside the Approval Loop.

Anatoliy Kolodkin

15 May 2026 • 4 min read

Codex landing in the ChatGPT mobile app is easy to misunderstand. This is not OpenAI trying to convince serious developers to review a refactor with their thumbs. It is OpenAI acknowledging the actual shape of agentic coding work: the agent runs for a while, hits a decision point, needs permission, shows a diff, asks whether to continue, and either gets unblocked or wastes the next hour doing nothing useful.

That makes mobile less of a coding surface and more of an interrupt surface. The phone is not replacing the workstation. It is becoming the place where a developer can keep a long-running agent inside the loop without staying chained to the machine that owns the repository, credentials, plugins, browser state, and tools. That is useful. It is also exactly where the security model starts to get interesting.

OpenAI says more than 4 million people now use Codex every week, which explains the product pressure. Once a tool has that many users, every stalled approval is no longer just an annoyance; it is lost agent throughput. Codex in ChatGPT mobile can follow active threads, review outputs, approve commands, change models, and start new work. The connected machine keeps the files, credentials, permissions, and local setup; the phone receives screenshots, terminal output, diffs, test results, and approval prompts in real time.

The architecture choice matters. OpenAI is not uploading your whole development environment into the phone. Remote access depends on a connected host — currently a Mac running the Codex app, with Windows support promised — and a secure relay layer so trusted machines can be reached across authorized ChatGPT devices without exposing them directly to the public internet. Remote SSH is now generally available too, letting Codex discover hosts from ~/.ssh/config, start a remote app server over SSH, and run threads against a remote filesystem and shell.

The approval screen is now part of the threat model

The practical benefit is obvious: start a bug investigation while waiting for coffee, steer a refactor during a commute, or approve the next step in a test-fix loop before the agent goes idle. But the approval context is now security infrastructure. A shell command approved from a phone while half-reading a diff between meetings is not the same control as a command approved at a workstation with the repository, logs, and architecture in view.

That does not mean mobile approvals are a bad idea. It means they need to be designed and governed as high-value decisions, not chat notifications with an “OK” button. A good approval prompt should show the command, working directory, target files, network intent, sandbox reason, and likely side effects. A bad one turns “agent autonomy” into “developer consents to mystery meat.” Teams should be especially careful with destructive commands, dependency installation, network access, cloud CLIs, database clients, and anything that touches credentials or production-adjacent systems.

The host inheritance model is both the feature and the risk. OpenAI’s remote-connection docs say Codex can use the connected host’s projects, threads, files, credentials, permissions, plugins, Computer Use, browser setup, MCP servers, skills, and local tools. That is why it can do real work instead of pretending a sandboxed chatbot knows your environment. It is also why the host should be treated like an agent control plane. If that machine has broad cloud credentials, persistent browser sessions, production dashboards, and local MCP servers wired into internal systems, mobile Codex can steer work inside a very large blast radius.

For Microsoft and Azure-heavy teams, this lands in the same conversation as Azure-hosted Codex, Copilot CLI, Agent 365, Entra Agent ID, Defender, Purview, and Foundry governance. The market is converging on the same hard question from different directions: where does the agent run, whose identity does it use, what credentials are reachable, what tools can it call, who approved the mutation, and can the organization reconstruct the chain after something goes wrong?

Hooks are useful guardrails, not magic armor

The mobile launch arrived alongside several enterprise-flavored Codex updates: Remote SSH general availability, Hooks generally available, programmatic access tokens for Business and Enterprise workspaces, and HIPAA-compliant local Codex use for eligible ChatGPT Enterprise customers. The least flashy of those may be the most operationally important.

Hooks let teams customize the agent loop around events such as SessionStart, PreToolUse, PermissionRequest, PostToolUse, UserPromptSubmit, and Stop. They can scan prompts for secrets, log conversations, run validators, create memories, and customize behavior per repository or directory. That is exactly the kind of local policy plumbing coding agents need.

But OpenAI is careful about the boundary: PreToolUse is described as “a guardrail rather than a complete enforcement boundary” because not every shell or tool path is intercepted. That honesty is important. Hooks are a way to encode process. They are not a sandbox, a kernel policy, or a replacement for filesystem isolation and network controls. If a team treats hooks as the blast wall, it is building on wishful thinking with YAML.

Programmatic access tokens deserve the same sober treatment. Tying automation to a ChatGPT workspace identity is better than a generic, anonymous API key floating around CI. It gives administrators a more meaningful audit story. But it also creates familiar automation-secret problems: leaked tokens, untrusted runners, shared identities, overly long expirations, forked pull requests, and credentials that outlive the workflow that justified them. OpenAI’s own token guidance warns about those failure modes and recommends finite expirations, including 7, 30, 60, and 90 days.

The practitioner move is not to ban this. It is to narrow it. Use dedicated devboxes or managed hosts rather than personal laptops full of unrelated secrets. Keep remote hosts behind VPN or mesh networking instead of exposed listeners. Separate agent worktrees from human worktrees. Use read-only or workspace-write defaults. Require stronger approval for network access and destructive operations. Export prompts, tool calls, diffs, test results, and approval decisions to a place security and engineering can query. Rotate tokens. Review hooks like production configuration.

The Hacker News reaction captured the split well: many developers immediately understood the value of steering long-running work away from the desk, while skeptics pointed out that this can make every walk and coffee line another place to keep working. One blunt version of the security concern was that if the phone controls the desktop running the agent, you can wipe the desktop from the phone. That is not anti-AI melodrama. That is the permission model stated without product copy.

Codex mobile is useful because coding agents are asynchronous. It is risky for the same reason. The agent does not become safer because the approval button moved into a polished app; it becomes safer when approval UX, host isolation, credentials, logs, and organizational policy are treated as one system. The phone is now in the development loop. The grown-up response is to make the loop auditable before it becomes muscle memory.

Sources: OpenAI, OpenAI Codex remote connections docs, OpenAI Codex hooks docs, OpenAI Codex access token docs, Hacker News discussion

The approval screen is now part of the threat model

Hooks are useful guardrails, not magic armor

Sign up for more like this.