codex

Google’s Managed Agents Turn the Codex Pattern Into an API Product

Anatoliy Kolodkin

23 May 2026 • 4 min read

Google is not copying Codex feature-for-feature. It is productizing the same underlying bet: agents need managed runtimes, not heroic prompt windows.

Google launched Managed Agents in the Gemini API, letting developers spin up an Antigravity-powered agent with one API call. The agent can reason, use tools, execute code, manage files, and browse the web inside an isolated Linux environment; sessions can preserve files and state across follow-up calls. This is the Codex/Claude Code/Copilot cloud-agent pattern moving down a layer: not just a product interface, but managed agent infrastructure exposed as an API.

Managed agents are the real product surface

Managed Agents are rolling out in preview through the Gemini API, the Interactions API, and Google AI Studio.
Google says a single call can provision an agent that reasons, uses tools, and executes code in an isolated, ephemeral Linux environment.
The agent is powered by the Antigravity agent harness and built on Gemini 3.5 Flash.
Each interaction creates or receives an environment that can be resumed later, preserving files and state for multi-turn sessions.
Developers can define custom agents with markdown instruction/skill files such as AGENTS.md and SKILL.md.
Google says the harness can execute code, manage files, browse the web, and fetch/process live data.
The related I/O developer highlights announced Antigravity 2.0, Antigravity CLI, Antigravity SDK, enterprise integration with Google Cloud projects, Android support in Google AI Studio, Workspace API integrations, and export from AI Studio to Antigravity.
Google says Gemini 3.5 Flash is built for long-horizon agentic tasks and claims it is 4x faster than other frontier models by output tokens per second.
The Gemini 3.5 Flash docs list a 1,048,576-token input limit, 65,536-token output limit, and support for code execution, caching, file search, function calling, URL context, search grounding, structured outputs, and thinking.

There was no clearly separate HN front-page thread for the Managed Agents announcement during research, but the main Gemini 3.5 Flash thread had 410 points and 322 comments and repeatedly circled the same issues Managed Agents will inherit: pricing, cache behavior, reliability, and whether Google’s integration layer is stronger than its standalone CLI story. One commenter said the “Antigravity harness is really well done” while still preferring DeepSeek on price. Others questioned whether Google’s caching is reliable enough for agentic workloads, where repeated context reuse is the cost model. That reaction is useful because managed agents are not judged only on model intelligence; they are judged on boring infrastructure traits: isolation, resumability, logs, billing, failure modes, and how often the API does the same thing twice.

Google is making the agent runtime itself a product. That matters more than another demo where an AI builds a toy app. The hard part of production coding agents is not the prompt; it is the harness: sandboxing, tool access, file state, resumable sessions, web access, code execution, error recovery, and observability. OpenAI Codex, Claude Code, GitHub Copilot cloud agent, Cursor, and OpenClaw all orbit the same center of gravity: a model is only as useful as the environment it can safely act inside.

Managed Agents are Google’s answer to “why should every startup rebuild the same sandbox?” If the API really gives developers an isolated Linux environment, resumable state, tool use, and custom agent definitions through markdown skills, it compresses a lot of scaffolding. A team can prototype a research agent, code-maintenance agent, or internal ops agent without standing up its own job runner, container pool, filesystem persistence layer, and browser/tool gateway on day one. That is attractive. It is also where lock-in starts to look like convenience wearing a nicer jacket.

The comparison to Codex is unavoidable. Codex-style tools gave developers a local or cloud agent that can read a repo, edit files, run commands, and produce a diff. Google is generalizing that pattern into Antigravity surfaces: desktop, CLI, SDK, Gemini API, AI Studio, Android Studio, and enterprise cloud projects. If OpenAI’s developer story is strongest when a builder wants a coding agent close to their repo and terminal, Google’s story is becoming “agent infrastructure across every surface we own.” That is a different kind of competition.

For practitioners, the first question should be boundaries. What can the managed agent access? How are tools declared? Where are logs stored? How long does environment state persist? Can secrets be scoped per interaction? What audit trail exists for file changes, web browsing, and code execution? Can you reproduce a run? Can you pin model versions? Can you export the state if you leave? Google’s launch post gives the product shape, but production teams need those operational answers before this belongs anywhere near customer data or privileged internal systems.

The second question is cost per useful outcome. Agentic workflows are context-heavy and retry-heavy. A million-token context window is powerful, but if the agent keeps dragging an entire repo, browser traces, logs, and generated artifacts through every turn, the unit economics depend on cache hit rates and model routing. The HN caching debate is not pedantry. It is the invoice.

The optimistic read: Managed Agents could make serious agent workflows accessible to smaller teams that cannot justify building their own runtime. The skeptical read: it may produce a wave of “one API call” agents whose permissions, state, and error modes nobody understands. The difference will be whether developers treat managed agents like infrastructure — versioned, tested, monitored, permissioned — or like a magic endpoint.

Read this as Google packaging the coding-agent runtime, not merely launching another model wrapper. The take: managed agents are useful if they replace undifferentiated sandbox plumbing; dangerous if they convince teams to skip runtime governance because the sandbox is someone else’s problem.

For teams comparing coding-agent stacks, the practical checklist is simple: record which surface triggered the agent, which model or runtime handled the work, which permissions were active, and what evidence reviewers can inspect later. If a vendor cannot answer those questions, the feature is still a demo no matter how polished the dropdown looks.

Sources: Google — Introducing Managed Agents in the Gemini API, Google — Developer highlights from I/O 2026, Google — Gemini 3.5 model announcement, Google AI docs — Gemini 3.5 Flash model card page, HN discussion — Gemini 3.5 Flash

Managed agents are the real product surface

Sign up for more like this.