MCP Cross-Agent Invocation Still Needs a Real User Role

MCP Cross-Agent Invocation Still Needs a Real User Role

OpenClaw’s MCP bridge has a small role bug with a large architectural smell: messages_send can put text into another session’s history, but it cannot reliably wake the receiving agent because it hardcodes the message as role: "assistant". In human terms, the message gets filed as something the agent said, not something the agent was asked to do.

That distinction is the whole story. Multi-agent systems do not just need pipes between runtimes. They need invocation semantics. A pipe moves bytes. An invocation carries authority, intent, provenance, and consequences. If the role is wrong, the transport can be perfectly healthy and the workflow still dies quietly.

Delivery is not invocation

Issue #86049 was opened on May 24 at 2026-05-24T13:03:49Z, fresh enough that broad community reaction had not formed yet. The report is still worth attention because it is precise. In openclaw mcp serve, the MCP bridge tool messages_send always writes outbound messages as assistant. OpenClaw’s agent loop responds to user messages. So an external MCP client can send text that lands in the conversation history, but the target agent does not wake up and act on it.

The reproduction path is straightforward: start OpenClaw Gateway with an active session, connect an MCP client such as Hermes, Codex, or Claude Code through openclaw mcp serve, call messages_send with a session_key and text, observe the message land as assistant role, then observe that the agent ignores it. The reporter reproduced this on OpenClaw 2026.5.20 with Gateway local loopback and Hermes running DeepSeek V4 Pro.

The proposed fix is intentionally small: add an optional role parameter to the tool input, constrained to "user" or "assistant", then default to today’s assistant behavior when the caller does not specify one. The reported source locations are in the built MCP CLI bundle around the tool schema and OpenClawChannelBridge.sendMessage. In code, this is not a moon landing.

In platform terms, though, it is a useful fault line. A message with role: "assistant" means “recorded assistant-side content.” A message with role: "user" means “new external instruction; wake the loop.” That role field is not UI decoration. It is control flow.

MCP standardizes capability, not trust

The Model Context Protocol gives hosts, clients, and servers a shared JSON-RPC grammar for exposing tools, context, and workflows. That is useful. It does not magically decide whether a bridged message should be treated as a user instruction, an assistant note, a tool transcript, a system event, or a delegated task from another autonomous process. That semantic layer still belongs to the application runtime.

This is where a lot of early MCP enthusiasm gets sloppy. Developers see a common protocol and assume interop is the hard part. Interop is only the first hard part. The second hard part is authority. If Agent A can send a message into Agent B’s session, is that equivalent to a human user speaking? Is it a lower-trust delegated request? Does it need approval? Can it target any session, or only sessions the caller owns? Does it carry an audit trail? Can it wake a sleeping agent, start tools, spend tokens, or trigger external side effects?

OpenClaw’s current behavior is safe-ish by accident and broken by design. Treating all MCP-originated sends as assistant messages prevents arbitrary external clients from waking agents through this path, but it also makes legitimate cross-agent invocation inert. Flipping the default to user would fix the workflow and create a different problem: a logging or relay bridge would become a command channel overnight.

The optional-role proposal is the right minimum viable shape because it makes intent explicit. A caller that wants to append assistant-side text can keep the default. A caller that wants to wake the receiving agent must ask for role: "user". That still leaves policy work to do, but at least the runtime stops pretending those two actions are the same.

The missing metadata is provenance

A robust fix should not stop at role. Conversation role and message origin solve different problems. The model needs a simple role so the turn fits the chat mechanics: user, assistant, tool, system, depending on the runtime. Operators need provenance: where did this instruction come from, who or what initiated it, was there a human in the loop, which MCP client sent it, and what authority did that client have?

The practical pattern is to allow role: "user" for loop semantics while attaching audit metadata such as origin=mcp, caller=hermes, human=false, delegated_by=session_x, or client_id. That way the receiving agent can wake up, but logs and policy engines do not misrepresent an autonomous delegation as a direct human instruction. This distinction matters the first time an agent sends another agent a request that deletes files, posts to Slack, opens a PR, or spends money through an API.

There is also an idempotency problem waiting nearby. Cross-agent calls need retry behavior, dedupe keys, and clear delivery receipts. If an MCP client sends a user-role instruction and times out waiting for acknowledgement, should it retry? If it retries, does the receiving agent perform the task twice? A message bridge that only thinks in terms of “append text to session” has no good answer. A task invocation layer can.

That is why the issue is more interesting than the patch size. It marks the point where OpenClaw users are treating MCP not merely as “tools for one assistant,” but as a bus between agent harnesses: Hermes, Codex, Claude Code, OpenClaw sessions, and whatever comes next. Once that happens, message role, permissions, audit, and lifecycle are not implementation details. They are the product.

What builders should test now

If you are building MCP bridges, add tests that assert behavior, not just delivery. “Message appears in history” is not enough. Tests should cover whether the recipient loop was triggered, whether assistant-role messages remain inert, whether user-role messages are gated by permission, and whether audit metadata survives the bridge. The failure mode here is especially dangerous because it looks successful in logs: the text arrived. The agent just did nothing.

For enterprise deployments, the governance question is sharper: which MCP clients are allowed to send user-role instructions? A local trusted harness may be allowed to wake a development agent. A third-party MCP server should probably not be allowed to wake a production-connected assistant just because it can speak JSON-RPC. Session targeting also needs constraints. A client should not be able to spray user-role instructions across arbitrary sessions without explicit binding.

OpenClaw’s recent runtime-governance work makes this issue timely. The project is tightening plugin metadata, sub-agent bootstrap context, Gateway startup behavior, and cross-channel delivery semantics. MCP invocation belongs in that same bucket. It is not enough for agent platforms to expose more surfaces. They have to define what those surfaces mean when one autonomous system talks to another.

My take: MCP is becoming the USB-C port for agents, but ports still need directionality, permissions, and labels. OpenClaw’s messages_send role gap is small in code and large in meaning. Multi-agent platforms need explicit invocation semantics, not just a pipe between chat logs.

Sources: OpenClaw issue #86049, Model Context Protocol specification, MCP schema, OpenClaw v2026.5.22 release