ai-frameworks

OpenAI’s Agents SDK 0.14.5 Is Another Reminder That Sandboxes, Not Agent Loops, Are Where Frameworks Get Real

Anatoliy Kolodkin

23 Apr 2026 • 4 min read

The easiest way to misunderstand agent frameworks in 2026 is to keep thinking the hard part is the loop. The loop is solved enough. The real problems are workspace lifetime, interrupted approvals, partial streaming, and all the ugly ways state gets lost after the demo is over. That is why OpenAI’s Agents SDK 0.14.5 matters more than its patch number suggests. It is another release about making the runtime behave when files, humans, and sandboxes are involved.

The official changelog is short. OpenAI added a Modal sandbox idle-timeout option, fixed human-in-the-loop resume tool outputs, and backfilled streamed terminal output. That reads like maintenance. It is actually a concise summary of the three things that most often make agent systems feel fake in production: a workspace that lives too long or dies too fast, an approval flow that cannot resume cleanly, and a tool stream that looks complete until the final envelope drops the important part.

OpenAI’s own documentation has been getting clearer about where the SDK wants to compete. The package describes itself as a production-ready upgrade to Swarm with a deliberately small set of primitives, but the docs spend real space on sandbox agents, resumable sandbox sessions, persistent sessions, guardrails, tracing, and human-in-the-loop controls. That is the right emphasis. Nobody serious is buying an agent framework in 2026 because it knows how to call tools in a loop. They are buying it because they need that loop to survive contact with actual work.

Idle timeout is really about cost discipline and workflow semantics

The headline feature in 0.14.5 is configurable idle_timeout for Modal-backed sandbox sessions. Superficially, that sounds like a cost-control knob. It is that, and it is more than that. The minute you give an agent a real workspace, inactivity becomes a first-class design problem. Kill the sandbox too early and you lose expensive setup state or interrupt long pauses in human approval flows. Keep it alive too long and you pay for dead sessions while expanding the blast radius of whatever secrets, files, or mounted state are still sitting there.

A good sandbox runtime needs separate ideas of total lifetime and idle lifetime. OpenAI’s patch does exactly that. According to the release notes and linked pull request, the timeout is persisted in Modal sandbox session state and applied both to fresh sandbox creation and restored snapshots. That persistence detail matters. If timeout policy disappears on resume, then the framework is not managing runtime semantics, it is just sprinkling options over one code path.

For teams running coding agents, document processors, or review flows inside isolated workspaces, this is the difference between a believable runtime and a dressed-up REPL. Infrastructure people notice these details immediately because they map to budget control, cleanup guarantees, and failure modes. Product teams should notice them too.

Resume correctness is where human-in-the-loop either becomes real or stays theater

The more revealing fix may be the HITL resume patch. OpenAI says the bug affected cases where one response mixed approval-gated and non-approval tool calls, leading resumed requests to fail with “No tool output found” because some locally generated outputs were treated as already acknowledged. This is exactly the kind of edge case that makes human approval look easy in a demo and maddening in production.

Human-in-the-loop is now mandatory checkbox territory across the framework market. Microsoft Agent Framework has approval modes. PydanticAI supports approval gating. LangChain and Deep Agents keep leaning into operator controls. But the difference between a feature and a workflow is resumability. If an interrupted request cannot correctly recover mixed tool-output state, then the framework has not really solved approval. It has solved a screenshot.

This is where OpenAI deserves some credit. The SDK keeps spending release energy on the runtime path rather than inventing more orchestration mythology. That is less exciting for social media and more useful for anyone trying to ship an agent that interacts with real users and real systems.

Stream recovery is a reminder that tool outputs are part of your data model

The terminal-output backfill fix is the third clue. OpenAI patched a case where streamed terminal output needed to be recovered from response.output_item.done events when the final response.output payload was empty. Again, boring sentence, important implication. The framework cannot assume the provider’s final envelope will present the truth in the neatest shape. Sometimes the stream contains the truth and the aggregate object loses it.

That matters because agent runtimes are increasingly judged on output fidelity, not just token generation. If a tool ran, emitted useful data, and the framework failed to preserve it because a final payload came back oddly shaped, the operator does not care that the bug was technically subtle. They care that the run became untrustworthy.

There is a broader market pattern here. OpenAI, Microsoft, LangChain, and PydanticAI are all paying down runtime debt around state continuity, typed boundaries, structured outputs, and sandbox behavior. The industry has finally started admitting what practitioners knew a year ago: the difficult part of agent infrastructure is not making the model call a tool. It is keeping the runtime honest across interruptions, transports, and partial outputs.

What engineers should actually do

If you are using OpenAI’s Agents SDK in anything artifact-heavy or approval-heavy, this is a release to test rather than casually absorb. Validate sandbox timeout behavior with your real session lengths, not toy examples. Separate total session lifetime from inactivity tolerance based on the workflow, not gut feel. For human-in-the-loop paths, simulate mixed responses where some tool calls need approval and others do not. If resume semantics are important to your product, make them part of CI. And for any terminal or streamed tool output, verify that your observability path captures both incremental events and final aggregated responses.

Also, be honest about what framework you are buying. OpenAI’s package still markets minimal primitives, but the real value is increasingly in the runtime layer underneath those primitives: sessions, resumability, tracing, sandboxes, and guardrails. That is a good direction. It is also a reminder that “minimal API surface” and “simple runtime” are not the same thing.

My take: 0.14.5 is another sign that OpenAI understands where the framework battle has moved. Not to handoff diagrams, not to multi-agent theater, but to workspace semantics and state continuity. That is the right fight. The teams that win this category will be the ones whose sandboxes feel boring, resumable, and expensive only when they are supposed to be.

Sources: OpenAI Agents SDK 0.14.5 release notes, OpenAI Agents SDK documentation

Idle timeout is really about cost discipline and workflow semantics

Resume correctness is where human-in-the-loop either becomes real or stays theater

Stream recovery is a reminder that tool outputs are part of your data model

What engineers should actually do

Sign up for more like this.