qwen

QwenPaw 1.1.11 Beta Makes Personal Agents Less Like Chatbots and More Like Governed Clients

Anatoliy Kolodkin

09 Jun 2026 • 7 min read

QwenPaw 1.1.11 beta is a reminder that personal agents are quietly becoming client platforms. The old framing was simple: install an assistant, chat with it, maybe connect a few tools. The new framing is less cozy and more useful: decide which tools it can see, which model it may call, which protocol clients can control it, which plugins can alter its interface, and how much evidence you get when something goes wrong.

That is the interesting part of 1.1.11-beta.2. The release has plenty of UI and channel polish, but the signal is governance: per-server MCP tool whitelisting, richer ACP metadata, plugin extension infrastructure, zero-config free providers with OAuth, browser-control fixes, context-compaction repairs, and a much more serious test pipeline. This is not QwenPaw trying to win a chatbot beauty contest. It is QwenPaw becoming a governed agent client, which is the only kind of personal agent worth letting near real workflows.

GitHub reports the release at 12:07:22 UTC on June 9. The compare from v1.1.11-beta.1 to 1.1.11-beta.2 contains 20 commits and 164 files changed; compared with stable v1.1.10, the beta branch is 50 commits ahead and touches 256 files. The desktop bundles are not exactly featherweight: roughly 577 MB for the macOS zip, 570 MB for the Windows setup, 492 MB for the Tauri macOS bundle, and 367 MB for the Tauri Windows installer, each with SHA-256 digest metadata. This is a full client, not a small wrapper around an API key.

The MCP whitelist is the feature teams should test first

PR #5002 adds per-server MCP tool whitelisting with a frontend toggle UI. The implementation includes a PUT /mcp/tools/{key} endpoint, list_all_tools() for management, runtime hot-patching without reconnecting, persisted configuration, constructor-based whitelist injection, and filtering in list_tools(). It also preserves a useful policy distinction: None means no whitelist configured, while [] means explicitly allow nothing.

That last detail is not pedantry. MCP servers often expose bundles that are too broad for the agent sitting in front of them: filesystem operations, browser actions, issue trackers, search, internal APIs, vector stores, maybe secrets-adjacent tools depending on the server. “Connect this server” is a dangerous unit of permission. “Connect these two tools from this server” is closer to how real organizations manage access.

The original point for practitioners is that MCP governance should be evaluated at the tool boundary, not the server boundary. A server can be trustworthy in general and still expose tools that are inappropriate for a particular assistant, channel, user, or task. QwenPaw’s per-server whitelist gives operators a way to express that. The next questions are equally important: is the whitelist visible in logs, can it be exported or reviewed, does it survive restart, and can a remote client infer which tools are intentionally hidden versus unavailable?

There is also an interoperability footnote here. Earlier QwenPaw work rewrote MCP tool names that violate OpenAI- or Anthropic-style regexes and mapped sanitized aliases back to real names on dispatch. The whitelist builds on that adapter layer. That is the right kind of boring. MCP sounds universal until model providers, tool schemas, naming rules, and client expectations collide. Agent clients that survive will need this kind of translation layer instead of pretending every server naturally speaks every provider’s dialect.

ACP metadata turns slash-command soup into client behavior

PR #4949 extends the QwenPaw ACP server so clients receive first-class metadata about commands, errors, tools, agent/model state, and file links. It advertises a curated command subset — clear, compact, mission, and skills — while deliberately not advertising operations that have native ACP affordances. /model maps to session/set_model, approvals map to session/request_permission, /stop maps to session/cancel, and /new maps to session/new.

That design choice deserves credit. Slash commands are convenient for humans and terrible as a universal control plane. If a client needs to change models, request permission, cancel work, or start a session, it should not be parsing magic strings from a chat transcript. It should call structured methods with typed results and predictable errors. QwenPaw’s curated advertising suggests the project understands the split: prose commands where text is natural, protocol operations where clients need reliable UI.

This matters for more than developer ergonomics. A personal agent that spans desktop, browser, Telegram, Feishu, DingTalk, Discord, and ACP clients needs consistent semantics. If cancellation works in one surface but not another, a user will eventually discover it at the worst possible time. If model switching is a command in one place and a state mutation in another, logs become harder to interpret. Protocol metadata is not glamorous, but it is how an assistant stops being a chat box and starts behaving like a client platform.

Plugins are useful; plugin surfaces are supply-chain surfaces

PR #4997 adds plugin extension infrastructure: menu, route, and slot registries; chat extension APIs; a host SDK; data-driven layouts; and an audit API called QwenPaw.audit.overrides() to inspect registered extensions. The diff is massive — 6,499 additions, 686 deletions, and 43 files changed — and the PR body reportedly still says “WIP — Do not merge.” That awkwardness is useful signal. This is a beta substrate, not a settled extension ecosystem.

The direction is obvious. QwenPaw wants to host agentic apps and workflow-specific augmentations, not just ship one bundled assistant. That can be powerful. A plugin could add domain-specific UI, expose a workflow panel, register a chat extension, or inject context that makes an agent much more useful for a particular team.

It can also become the new browser-extension problem. If plugins can modify UI, route behavior, prompt context, or tool affordances, then plugins are supply-chain artifacts. They need review, provenance, versioning, and an answer to the question “what changed my assistant?” The audit API is encouraging because visibility is the first step. Beta users should still be conservative: install no plugins at first, record baseline behavior, then add extensions one at a time and inspect what they register.

Provider convenience changes the privacy and cost story

PR #5049 adds zero-config free model providers and one-click OAuth authentication. The release trail describes OpenCode/Kilo free providers with dynamic model fetching, an OpenRouter OAuth popup with session fallback, Tauri desktop handling through openExternalLink plus polling, FREE/PRO tabs in the model selector, provider grouping, view-more pagination, and inline API-key entry on provider cards.

That is an adoption wedge. Most personal-agent projects lose users at setup: model keys, provider names, base URLs, local services, desktop permissions, chat-channel credentials. Reducing that friction is sensible. But provider convenience is also where costs and privacy get slippery. A user who believes they are “running locally” may route a task to a hosted free model. A team that standardizes on one endpoint may accidentally leak different workloads to different providers through a UI default. OAuth improves UX, but it also changes token storage, revocation, and account-linking behavior.

The practical advice is not “avoid OAuth” or “avoid free models.” It is: inspect defaults. Check which provider is active before sensitive work. Confirm whether logs include prompts, tool results, or file links. Verify where tokens live on disk. Switch between local, free, and paid providers and watch whether context limits, compaction, and tool behavior follow the active model instead of falling back to stale assumptions.

That last point connects to PR #5021, which fixes /compact and auto-compaction ignoring the model’s max_input_length when agent.json lacks active_model. The bug caused compaction to fall back to a 128K default even when the configured model had a different context window. In a multi-provider agent, context-window metadata is not a documentation field. It changes when the assistant compresses history, what it preserves, and when it starts forgetting important tool output.

Browser control and tests are where the demo becomes software

QwenPaw also picked up browser-control fixes that look small until an agent tries to use a real web app. PR #4905 adds page-coordinate click support for Canvas and WebGL interfaces where DOM selectors or refs do not exist. PR #4944 adds browser-specific user data directories for cross-browser switching, reuses wait_time as a CDP connection timeout, and adds launch-failure hints for incompatible profile directories.

Coordinates are sometimes necessary. They are also dangerous if the agent silently falls back from “click this known element” to “click this area of the page” without making that explicit. The right behavior is priority and visibility: selector/ref when available, coordinates only when requested or clearly needed, and logs that make the chosen path obvious. Browser profile isolation is the same genre of operational hygiene. Nobody wants a desktop agent that feels intelligent until it corrupts a browser profile and starts failing in ways that look supernatural.

The testing work is the quiet confidence builder. PR #4945 expands hermetic integration coverage from 162 to 217 cases, adding 55 agent-scoped P0 contract tests across routing, skills, tools config, heartbeat runs, console chat SSE, and MCP OAuth PKCE. PR #5054 adds a full E2E Playwright CI pipeline with backend startup in isolated temp directories, auth disabled for test runs, subprocess coverage, graceful shutdown, and nightly integration into a four-tier coverage report. That is the difference between a hobby assistant and software that might survive real users.

For builders, the evaluation plan is concrete. Connect one MCP server and disable most of its tools. Confirm that None and [] behave differently. Connect an ACP client and verify model switching, cancellation, approvals, and command discovery through structured affordances. Avoid plugins until you can inspect what the baseline registers. Test browser control on a Canvas-heavy application. Switch providers and watch compaction thresholds. Then read logs as if you were doing incident response, because eventually you will be.

The editorial take: QwenPaw 1.1.11 beta is interesting because it is making personal agents less like chatbots and more like governed clients. That is less marketable than “your AI assistant can do everything,” but infinitely more useful. The future personal agent is not the one with the most charming greeting. It is the one that can tell you what it is allowed to do, which model it used, which tools it hid, which plugin changed the UI, and why the browser clicked where it clicked.

Sources: QwenPaw 1.1.11-beta.2 release, QwenPaw repository, MCP whitelist PR #5002, ACP metadata PR #4949, plugin infrastructure PR #4997, provider OAuth PR #5049, compaction fix PR #5021.

The MCP whitelist is the feature teams should test first

ACP metadata turns slash-command soup into client behavior

Plugins are useful; plugin surfaces are supply-chain surfaces

Provider convenience changes the privacy and cost story

Browser control and tests are where the demo becomes software

Sign up for more like this.