Bernstein v2.5.0 Treats Every Agent Host Like an Untrusted Dependency

Bernstein v2.5.0 starts from the assumption most agent platforms are still politely avoiding: the coding-agent ecosystem is not going to be clean. Real teams will not standardize on one blessed host, one tool server, one auth story, and one pristine audit trail. They will have Claude Desktop on one machine, Claude Code in a terminal, Cursor in another workflow, Continue or Cline in a legacy repo, Zed for a subset of developers, Aider in a corner, and a pile of MCP servers written by people with different threat models and different weekends available.

The orchestrator’s job is not to make that look elegant. It is to make the handshakes explicit, signed, bounded, and reviewable. Bernstein v2.5.0 is interesting because it treats every agent host and every MCP server like an untrusted dependency. That is the correct posture. It is also the posture more agent tooling will be forced to adopt once the demo phase gives way to production workflows.

The release adds signed A2A capability cards, lineage-chain interop, a hardened MCP client, prompt-catalogue and OAuth 2 PKCE discovery metadata for its MCP server, host registration across seven desktop/agent tools, deterministic replay improvements, and a privacy cleanup that removes hardcoded private observability endpoints from shipped defaults. The package had 22 commits since v2.4.0; during research capture the repo showed 431 stars, 43 forks, five open issues, an Apache-2.0 license, and a May 20 publish timestamp.

The handshake is the product

The maintainer puts the release’s center of gravity plainly: “The piece that kept blocking me on multi-host runs was the lack of a real handshake.” That is the right sentence to underline. Agent delegation without a real handshake is just trust fall engineering. One process claims it can do something, another process believes it, and the operator discovers too late that the policies, tools, cost limits, or audit boundaries did not match.

Bernstein’s A2A capability cards are an attempt to put structure around that moment. A process can mint a signed manifest describing identity, advertised tools, supported policies such as cost cap, redaction tier, sandbox profile, public key, and expiry. The consumer verifies the signature against a trusted-issuer set and refuses delegation when the advertised policies do not meet the operator’s required policies. The body is JCS-canonical JSON signed as detached JWS with Ed25519, according to the research brief.

The important phrase is not “signed manifest.” It is “refuses to delegate.” Security theater signs things and then lets the workflow continue after a warning. Governance enforces a fail-closed comparison between what a peer claims and what the operator allows. If a coding agent is about to hand work to another host that has broader tool access, weaker redaction, or a looser sandbox, the correct answer is not a tasteful yellow triangle. The correct answer is no.

The lineage-chain interop matters for the same reason. Bernstein wraps its signed tracker-audit chain into the A2A envelope under bernstein.lineage_v2 and appends a cross-org boundary marker on receipt. That sounds bureaucratic until you need to explain which host made which decision, which tool produced which output, and where responsibility crossed from one operator domain to another. Multi-agent coding without lineage is distributed blame assignment with nicer logs.

MCP servers need dependency hygiene, not vibes

The MCP client work is the release’s most practical section. The maintainer writes that upstream servers will “return malformed responses, hang mid-stream, demand re-auth, lie about their capability manifest.” The fix is not optimism. Bernstein now validates capability cards before tool calls, retries with continuation on dropped streamed calls, uses idempotency keys when a server has no resumption support, preserves partial output on cancellation, meters cost per server, contains schema violations, and marks a misbehaving server degraded for the rest of the task.

This is the agent-security checklist in release-note form. MCP made tool integration easier, but it also made tool risk easier to hide. A coding agent can call a friendly-sounding tool that burns budget, leaks context, returns malformed JSON, or hangs a run for twenty minutes. If all spend and errors collapse into one aggregate “agent run” line item, operators cannot debug the bill or the failure. Per-server cost metering and degraded-server state are not enterprise niceties; they are how you keep a multi-tool agent from turning one flaky dependency into a system-wide hallucination machine.

The server side got prompt-catalogue plus OAuth 2 PKCE discovery metadata so auto-discovering hosts expecting RFC 8414 and RFC 9728-style surfaces do not skip Bernstein. That is another boring-but-real detail. The future of coding-agent infrastructure will not be one monolith. It will be host discovery, tool manifests, auth metadata, policy comparison, and enough graceful degradation that a bad server does not poison the whole task.

Bernstein also added desktop-register --host <name> for Claude Desktop, Claude Code, Cursor, Continue, Cline, Zed, and Aider. bernstein doctor --substrate reports missing or stale registrations. This is where agent infrastructure meets the actual workstation. Host-specific config is not glamorous, but if registration is manual and inconsistent, policy is manual and inconsistent too.

The privacy fix is the credibility test

The most reassuring change may be the embarrassing one. Bernstein’s shipped wheel had two private observability hostnames baked in as defaults for a GlitchTip DSN and telemetry endpoint. The release says the package did not actually reach out without consent because the backends soft-failed when environment variables were unset. Still, defaults sitting in shipped code are a latent leak. A future integration reads a config path differently, a “soft fail” becomes a real request, and suddenly a tool meant for operator control is pointing at the maintainer’s infrastructure.

v2.5.0 removes those defaults and adds a regression test asserting zero operator-private host, IP, or DSN matches in src/. That is the right fix. Not a paragraph promising to be careful. A test that fails the build if the problem comes back. Agent tooling is going to touch source code, credentials, tickets, customer context, and private design docs. If a project cannot be strict about its own telemetry defaults, it has no business asking to orchestrate everyone else’s tools.

The deterministic replay changes round out the same story. Session IDs are bound deterministically so a replayed run reproduces its event stream without colliding with a sibling. The supervisor enforces bounded respawn budgets and parks exhausted agents instead of looping forever. On-disk state gets versioned migrations for .sdd/. Runs surface memorable deterministic names so an operator can refer to “the brisk-sparrow run” instead of a UUID. These details make incident review possible. They also make the system feel less haunted.

Practitioners should steal the checklist, even if they never install Bernstein: require signed capability manifests for delegation; compare policies before crossing host boundaries; meter MCP spend per server; contain schema violations; degrade bad upstreams; preserve partial output on cancellation; bound respawns; make replay deterministic; remove private infrastructure from defaults; and fuzz request schemas so invalid user input fails as a 422 at the boundary, not a 500 deep in the task store.

The caveat is equally worth keeping. The release says the new transports are functional but not load-tested at adversarial scale, and full token issuance/OIDC federation is deferred. Good. That is an honest boundary. Agent infrastructure needs more of that: specific claims, explicit non-claims, and protocols that fail closed when reality gets messy.

The editorial take: Bernstein v2.5.0 is agent governance moving from slogans to protocol surfaces. Once coding agents span hosts and MCP servers, trust has to be signed, metered, replayable, and revocable. Otherwise it is distributed prompt injection with a nicer dashboard.

Sources: GitHub — Bernstein v2.5.0, Bernstein v2.5.0 release notes, PR #1698 — A2A capability cards and lineage interop, PR #1692 — hardened MCP client, PR #1694 — remove operator infrastructure defaults.