openclaw

OpenClaw’s Gemini Image Bug Was Really an HTTP/2 Trust-Boundary Bug in Disguise

Anatoliy Kolodkin

17 Apr 2026 • 4 min read

For a certain class of infrastructure bug, the worst outcome is not a hard failure. It is a misleading one. OpenClaw users thought Google Gemini image generation was timing out upstream, because the error that surfaced was blunt and familiar: HTTP/2 stream timeout after 12 seconds. The more interesting reality was that Google had already done the work. The image request completed in AI Studio logs. The failure happened in OpenClaw’s own transport path, which turns this from a provider hiccup into a much more useful story about agent-platform engineering.

The regression showed up in OpenClaw 2026.4.15 against both google/gemini-3-pro-image-preview and google/gemini-3.1-flash-image-preview. The reporter’s setup was ordinary enough to matter: macOS on Apple Silicon, Node v22.22.0, a stable global install, Google configured as the default image generation provider, and provider fallback disabled. The request itself was not exotic either. It was a straightforward image-generation flow with a reference image, the kind of workflow people use when they are actually building product assets rather than stress-testing edge cases. Yet OpenClaw failed with HTTP/2: "stream timeout after 12000" while Google’s side still showed the job completed. That narrows the blame quickly. If the provider finishes and the client times out below the tool’s advertised 60 second timeout, the abstraction is leaking.

The patch in PR #68114 is small, but it says a lot. In the fetchWithSsrFGuard path, when pinDns is disabled and no explicit dispatcher policy is supplied, OpenClaw now creates a dedicated HTTP/1 agent with HTTP/2 disabled rather than letting the request inherit ambient Undici behavior. The stated motivation is pragmatic: with Undici 8, the global dispatcher can negotiate HTTP/2 via ALPN, and that negotiation was surfacing a 12 second stream timeout below OpenClaw’s own provider timeout contract. In other words, the tool said one thing about its timeout model while the transport stack quietly enforced another. That is not just annoying. It is the kind of inconsistency that makes operators stop trusting the platform’s guarantees.

This is where the bug gets more interesting than the fix. Most agent frameworks still behave like application code that happened to grow some plugins. But the moment a framework brokers model APIs, media uploads, browser sessions, outbound fetch rules, and SSRF protection, it stops being just an app. It becomes networking middleware. At that layer, defaults are not innocent. The global dispatcher is policy. ALPN negotiation is policy. A silent jump from one transport behavior to another is a trust-boundary decision, even if it arrived accidentally through a library upgrade.

The OpenClaw team was right to narrow the scope. The PR explicitly leaves redirect handling, SSRF policy, audit capture, and config surface unchanged. Tests moved in the expected direction, with the fetch-guard SSRF suite rising from 41 of 41 to 42 of 42, adjacent src/infra/net tests passing 87 of 87, and neighboring provider tests passing 21 of 21. That is what a disciplined same-day regression fix should look like. But the bot review on the PR also raised a second-order concern worth taking seriously: if you construct a direct agent when no dispatcher policy exists, you may bypass host-level global dispatcher controls such as proxy routing, egress monitoring, custom trust stores, or corporate logging. That does not automatically make the patch wrong. It does make the tradeoff visible, and visibility is the point.

This is what transport debt looks like in an agent platform

The useful lesson here is that agent products now fail at the seams between policy layers. OpenClaw promises SSRF-guarded outbound fetches. It promises provider calls that respect configured timeout behavior. It promises a single surface where a tool call like image generation should either work or fail for understandable reasons. But if one branch quietly inherits ambient runtime transport behavior while other branches honor platform policy, you no longer have one network stack. You have several. They just happen to share a logo.

That matters more for image generation than it might seem. Text completions usually fail fast and produce obvious errors. Media workflows are long-lived, heavier, and more likely to expose differences in streaming semantics, socket reuse, protocol negotiation, and timeout layering. That makes them an early-warning system for infrastructure cracks. When a model platform “works fine except images,” it is often revealing that the control plane was designed around short request-response flows and only later stretched into multimodal infrastructure.

There is a broader industry pattern here too. AI operators keep getting told that model choice is the main source of reliability variance. Sometimes it is. Just as often, the problem is below the model line: HTTP client behavior, retry policy, proxy routing, socket lifetime, TLS posture, or internal timeout propagation. The more orchestration layers a platform adds, the more important it becomes to own transport deterministically instead of inheriting whatever the runtime thinks is reasonable this week.

What practitioners should do now

If you run OpenClaw in production, do not file this one under “Gemini bug fixed, move on.” Treat it as a prompt to audit your own trust model around outbound model traffic.

First, check whether your deployments rely on ambient Undici global dispatcher behavior for proxying, egress restrictions, or observability. If they do, pay attention to the concern raised in review: forcing a dedicated HTTP/1 agent may solve the timeout regression while also changing how traffic traverses your network controls. That is not hypothetical. In many real environments, the global dispatcher is where the compliance and networking story actually lives.

Second, test long-running media paths separately from ordinary text completions. Provider health dashboards can look green while image, audio, or file-heavy operations fail under different transport assumptions. If you only test prompt-response chat flows, you are verifying the least interesting part of the stack.

Third, treat timeout contracts as part of your platform API. If your tools advertise a 60 second timeout, make sure nothing lower in the stack can silently terminate at 12 seconds unless you can explain that behavior and monitor it. Hidden lower-layer timeouts are one of the fastest ways to turn an infrastructure product into folklore.

My read is simple. OpenClaw did not really have a Gemini problem. It had an infrastructure-contract problem that happened to show up through Gemini image generation. That is actually the more useful story, because model bugs come and go. Transport correctness compounds. The platforms worth trusting over the next year will be the ones that stop treating networking behavior as incidental plumbing and start treating it as part of the security and reliability model they are selling.

Sources: OpenClaw PR #68114, OpenClaw issue #68104, Undici client docs, OpenClaw docs

This is what transport debt looks like in an agent platform

What practitioners should do now

Sign up for more like this.