n8n 2.22.0 Shows AI Workflow Builders Need Evals, MCP OAuth Correctness, and Self-Healing Plans More Than Prompt Polish

n8n 2.22.0 Shows AI Workflow Builders Need Evals, MCP OAuth Correctness, and Self-Healing Plans More Than Prompt Polish

n8n 2.22.0 is what happens when “AI builds your workflow” leaves the demo booth and runs into graph validators, OAuth audiences, credential setup flows, and users who expect the thing to fix its own mistakes.

The release is long, but the interesting thread is coherent: n8n is moving more product knowledge out of prompt vibes and into executable contracts. Plan-tool validator errors now return to the LLM as tool results instead of aborting the run. Switch fallback behavior is described in metadata and evals so generated workflows wire correctly. MCP OAuth audience and discovery bugs are fixed. Instance AI gets better first-turn guidance, trace metadata, resumed-run telemetry, and fewer unnecessary credential prompts.

That may sound like a bag of bug fixes. It is actually a roadmap for serious AI workflow builders: validators, evals, OAuth correctness, domain-scoped credentials, and traces matter more than another paragraph of prompt polish.

The validator should be part of the loop, not the end of the run.

The most instructive fix is PR #30592. The orchestrator’s plan tool used to throw a MastraError when plannedTaskService.createPlan() hit validator failures. Now those failures are returned as a structured tool result: {result: 'Error: <validator message>...', taskCount: 0}. That difference is the whole agentic-builder argument in miniature.

If a deterministic validator rejects a graph, that is not necessarily an exceptional condition. It is feedback. The LLM emitted an invalid plan; the system knows why; the next useful action is to let the model see the complaint and try again. Throwing an exception aborts the run and converts recoverable structure feedback into a product failure. Returning the validator message as a tool result turns the validator into a repair mechanism.

The PR notes the issue surfaced repeatedly in a multi-turn eval suite: the agent emitted checkpoint tasks with deps: [] on multi-source workflows roughly one in four runs. That number matters. It says the failure was not a freak edge case. It was sampling variance meeting a strict graph contract. Prompting can reduce the rate, but it will not eliminate malformed structure. Runtime repair is the correct layer for this class of problem.

Workflow nodes need machine-readable product knowledge.

PR #30449 applies the same philosophy to Switch V3 fallback behavior. Generated workflows that need a catch-all branch must enable fallbackOutput: 'extra' before wiring the default path. Humans can learn that from docs. Models need the rule near the decision point. n8n adds fallback hints, updates stale SDK examples, introduces a SWITCH_FALLBACK_OUTPUT_DISABLED validation warning, and adds deterministic Instance AI coverage for fallback wiring.

This is how AI workflow builders should evolve. Every workflow node has hidden operational knowledge: what outputs exist, when a branch is enabled, which credentials are optional, which settings are mutually dependent, which defaults are safe, and which graph shapes are invalid. If that knowledge lives only in prose docs or prompt templates, the builder will eventually wire something impossible. If it lives in metadata, validators, SDK examples, warnings, and eval fixtures, the model has a fighting chance and the product has a regression suite.

The validation list for #30449 included targeted tests for validation, SDK workflow-pattern prompts, Switch node behavior, binary eval checks, and an Instance AI subagent fixture. That is the part worth copying. Evals should not just ask whether the generated workflow “looks good.” They should exercise the actual node contracts that tend to break.

MCP OAuth is now real enough to have boring OAuth bugs.

The MCP fixes are the release’s governance layer. PR #30055 aligns issued OAuth token aud values with the advertised MCP resource URL. The old server advertised ${baseUrl}/mcp-server/http but issued tokens with aud="mcp-server-api", a non-URL literal. Verification continues to accept legacy tokens until natural rollover, which is the practical migration choice.

That bug is exactly the kind of thing that makes clients, servers, and operators disagree about who is wrong. OAuth audience values are not decorative. They are how a resource server knows the token was meant for it. MCP’s rise means tool servers now inherit all the old identity plumbing problems from web APIs, except the clients are often agents and workflow builders rather than humans reading error messages.

PR #30231 adds origin-only authorization-server discovery fallback for path-bearing MCP server URLs such as https://mcp.atlassian.com/v1/mcp. Real services often put the MCP endpoint under a path while publishing OAuth metadata at the origin. If the client searches only the path shape it expects, “connect my tool” becomes an auth debugging session. PR #30343 tightens the credential side: service-specific MCP OAuth credentials can be used with the MCP endpoint domain while unrelated domains remain blocked. The test plan explicitly says a request to google.com should fail and only the service MCP endpoint domain should be allowed.

That is the correct direction. MCP convenience should not become bearer-token sprawl. If a credential exists to call an Atlassian-style MCP endpoint, the workflow builder should not be able to wave it at arbitrary domains because the model found a URL-shaped string in context.

Trace the first visible thing, not just the final answer.

Instance AI gets a set of fixes that read like support-ticket archaeology. PR #30315 adds first-turn guidance and trace metadata including first_visible_state, first_tool_name, cancellation_type, and idle_tail_ms, while preserving timeout details across active, suspended, and pending-confirmation timeout paths. PR #30335 fixes credit accounting and telemetry for resumed Instance AI runs so successful resumed runs go through the same first-thread credit claim path as foreground runs.

These are not model improvements. They are operational affordances. If a user says the AI builder “did nothing,” the trace should show whether it made a silent tool call, timed out, suspended, waited for confirmation, or showed a visible state. If a resumed run succeeds, billing and telemetry should treat it consistently rather than creating a special accounting shadow path.

The credential-flow changes matter for the same reason. PR #30451 updates the planner prompt so planning does not block on credential or timezone questions when the builder can proceed with named credentials, a single matching credential, or a mocked/setup flow. PR #30638 grounds setup guidance in the actual inline AI Assistant setup card, forbids user-facing instructions about opening the editor, canvas, or clicking a generic Setup button, and makes build-first/setup-after-verification the default path for missing credentials.

This is product-specific grounding, and it is necessary. AI builders should not ask the user to solve setup before the system has built enough of the workflow to know what setup is actually required. They also should not hallucinate UI instructions that do not match the product. “Build first, verify, then guide setup through the real inline card” is not glamorous. It is how you stop an AI assistant from becoming a confusing support article generator.

There was no meaningful public reaction during the research window. Hacker News searches for n8n 2.22.0, Instance AI, and MCP OAuth returned nothing useful; Reddit was noisy. That silence is not evidence that the release is unimportant. AI workflow-builder bugs tend to surface as failed customer runs, eval regressions, and support tickets long before they become public discourse.

Practitioners running n8n as an AI automation platform should upgrade and test the contract edges. Generate workflows with invalid plan graphs and confirm validator feedback returns to the model. Exercise Switch defaults and fallback branches. Verify MCP OAuth discovery against path-based servers. Audit service-specific MCP credentials for domain scoping. Inspect Instance AI traces for first visible state, timeout details, cancellation type, and resumed-run behavior. Regression-test missing-credential flows so the builder does not block too early or give instructions for a UI that is not actually on screen.

The take: n8n is turning AI workflow generation from “prompt the builder harder” into a governed runtime with validators, evals, OAuth correctness, credential boundaries, and traces. That is what the category needs. Prompt polish is the least durable layer.

Sources: n8n 2.22.0 release, PR #30592, PR #30055, PR #30231, PR #30449, PR #30315, PR #30343