qwen

Qwen Code’s May 29 Nightly Pushes the Agent Out of the Terminal and Into Team Chat

Anatoliy Kolodkin

28 May 2026 • 6 min read

Qwen Code’s May 29 nightly is not a model launch, which is precisely why it is worth paying attention to. The release moves Alibaba’s terminal-first coding agent into Feishu/Lark team chat and adds telemetry groundwork for measuring skill-driven response-time improvements. That sounds like plumbing because it is. But for coding agents, plumbing is the product after the demo ends.

The release tag, v0.16.1-nightly.20260529.7bed56b9b, was published at 00:40 UTC on May 29, with npm metadata showing the matching package published seconds earlier at 00:40:06.832 UTC. The package now has 458 published versions; latest remains 0.16.2, while this build sits on the nightly channel. In other words: do not treat this as the stable upgrade path for every workstation. Do treat it as a very clear signal about where Qwen Code is heading.

The chat adapter is a runtime boundary, not a novelty bot

The headline change is PR #4379, a Feishu/Lark channel adapter that adds nearly 4,000 lines across 19 files. It introduces a new packages/channels/feishu package, documentation, media and markdown utilities, adapter tests, channel registry wiring, dependency updates, and build changes. More importantly, it turns Qwen Code from something a developer invokes in a terminal into something a team can summon from chat.

That shift changes the operational model. A terminal session usually has one user, one working directory, one conversation, and one person responsible for cleaning up the mess. A team-chat agent has to handle direct messages, group chats, quoted messages, previous bot cards, concurrent requests, stop requests, attachments, and humans who assume the bot understands conversational context even when the platform’s threading model does not make that easy.

The adapter’s feature list suggests the maintainers understand that difference. It supports WebSocket and Webhook modes, with WebSocket as the default. That matters because many internal teams can open an outbound long connection far more easily than they can expose a public callback URL for a webhook endpoint. It supports interactive card streaming with throttled in-place updates, a stop button, quote/reply context retrieval for text and card messages, image and file attachments, DM/group usage, and per-message state isolation for concurrent requests.

The group-chat defaults are also sane. The docs require a Feishu organization account, a bot-capable app, Long Connection event subscriptions, im.message.receive_v1, and permissions including im:message, im:message:send_as_bot, and im:resource. For groups, requireMention defaults to true, which is the correct default unless your goal is to build the most expensive reply-guy in the company. Group policy can be set to allowlist or open, but the existence of that policy knob is itself the point: chat integration is access control, not just UX.

Practitioners should treat this as infrastructure. Before putting a coding agent into Feishu or Lark, decide which groups can invoke it, which repository or working directory it runs from, what credentials are in the environment, whether it can write files, what tools are enabled, how attachments are stored, and where logs go. The adapter gives Qwen Code a new surface area. It does not make the security model disappear.

Streaming cards are what terminal UX becomes in a group

The interactive-card design is more than polish. Long LLM responses in chat are usually scroll damage: a thousand words, a few code blocks, maybe a table, and now everyone’s phone is a log viewer. Qwen Code’s adapter renders responses as native Feishu markdown cards, updates them in place on a 1.5-second throttle, reacts with an “OnIt” emoji while processing, and can collapse long responses into expandable sections. The docs even call out card-size limits for very long responses with many tables.

That is the correct product instinct. Once an agent moves into chat, the output artifact has to behave like a status object, not a transcript dump. Users need to see that work is happening, stop it from the same place they requested it, and read the final result without the bot consuming the entire room. If Qwen Code wants to operate in team channels, this is the minimum viable control plane.

There is also a subtle concurrency issue here. In a terminal, if two prompts collide, the user knows who typed both of them. In a group chat, two people can mention the bot at the same time, attach different files, quote different prior messages, and expect separate answers. Per-message state isolation is not an implementation detail; it is the difference between a useful shared assistant and a race condition wearing a bot avatar.

The response-time work starts with admitting there is no speedup yet

The second new surface, PR #4565, is less visible but arguably more important. It adds telemetry foundations for skill-based response-time optimization: roughly 2,400 lines across 11 files, including design docs under docs/design/rt-optimization/, telemetry type changes, qwen-logger wiring, scheduler prompt_id propagation, and tests across loggers, qwen-logger, skills, and the core tool scheduler.

The most honest sentence in the change is that it does not deliver measurable response-time improvement yet. Good. Agent latency is full of fake wins. A model gets faster, but the loop count goes up. A scheduler saves milliseconds, but the agent burns another full turn because the tool returned incomplete data. A skill seems useful in one demo, but nobody can tie its launch event to the tool calls and final answer in production traces.

PR #4565 attacks the measurement problem first. The maintainers note that QwenLogger.logSkillLaunchEvent existed but had zero callers, so backends consuming the qwen-logger pipeline silently missed skill_launch events. The fix mirrors the existing logToolCall pattern and propagates prompt_id from the user turn into SkillLaunchEvent through SkillToolInvocation.setPromptId(id) and a scheduler path. The goal is simple: join the skill that fired with the tool calls and answer that followed.

That is the right primitive if you care about real agent performance. Response time is not just tokens per second. It is the number of model-tool-observe loops needed to reach an answer. The design docs reportedly make the same argument: the biggest lever may be skill and tool design, not framework scheduling. If a better skill returns a complete answer in round one, a three-round loop can become a two-round loop. That dwarfs most micro-optimizations.

Teams evaluating Qwen Code should copy the method, not just the feature. Instrument skill launches. Propagate a turn identifier. Join tool calls to the skill that caused them. Track loop count, wall time, token use, and cancellation. Only then decide whether a “faster” workflow is actually faster, or just better at hiding where it spends time.

The pattern across the week is becoming hard to miss

This nightly lands after a dense run of Qwen Code operational work. Stable v0.16.2 shipped local project memory, background-agent caps, credential redaction, dangerous-interpreter filtering, Token Plan cache-control support, compaction changes, HTTP spans, headless budgets, worktree startup, auto-skill defaults, and skill overwrite protection. The May 28 nightly fixed startup-warning visibility and OpenTelemetry LogToSpan diagnostics so failures were useful without corrupting the terminal UI.

The May 29 nightly adds the next layer: multi-surface operation and measurable loop optimization. That is a more interesting story than another benchmark slide. Qwen Code is being shaped into an agent runtime that can exist in a terminal, scripts, IDEs, daemons, and now team chat. Each new surface brings a new class of failure: noisy group behavior, confused context, unbounded concurrency, unreadable output, missing cancellation, and telemetry gaps that make optimization indistinguishable from vibes.

The practical takeaway is blunt. If you are comparing Qwen Code with Claude Code, Codex, Copilot, Gemini CLI, OpenClaw, or local Qwen/Ollama workflows, stop scoring only model output. Score the operating surface. Can the agent be invoked from the places your team actually works? Can it isolate concurrent users? Can it cancel cleanly? Can it preserve traceability across skills and tools? Can you tell whether a workflow improved because the model got faster, the skill got better, or the loop got shorter?

This is still a nightly, and the Feishu adapter is not a full enterprise governance story by itself. But the direction is credible. Qwen Code is moving from “terminal assistant for a developer” toward “agent runtime a team might operate.” That is where the coding-agent race is going, whether the marketing pages admit it or not.

Sources: GitHub release: QwenLM/qwen-code v0.16.1-nightly.20260529.7bed56b9b, Qwen Code repository, Feishu/Lark channel docs, Feishu adapter PR #4379, skill-based RT telemetry PR #4565, OpenTelemetry trace concepts

The chat adapter is a runtime boundary, not a novelty bot

Streaming cards are what terminal UX becomes in a group

The response-time work starts with admitting there is no speedup yet

The pattern across the week is becoming hard to miss

Sign up for more like this.