Crush Nightly Makes Skills Visible, Which Is Exactly Where Agent Runtimes Are Headed

Charm’s latest Crush nightly looks like a tiny UI release if you read it quickly. A skill picker gets descriptions. Invoked skills render as attachments with a marker. A few chores land around Windows test flakes and signatures. Nobody is going to make a launch video for that.

They probably should not. But they should pay attention, because this is exactly the class of change that separates an agentic coding toy from an agent runtime you can actually debug.

The May 25 nightly is four commits ahead of Crush v0.71.0. The substantive work is PR #2970, which adds descriptions to the user-invocable skills list and renders invoked skills as custom attachments. The test fixture uses front matter with name, description, and user-invocable: true, then expects .agents/skills/test-skill/SKILL.md to show up as an attachment rather than as ordinary chat text.

That detail matters. A normal message looks like user intent. An attachment looks like context or capability being supplied to the agent. When skills can package repeatable workflows, policies, debugging procedures, review heuristics, release playbooks, and tool instructions, the difference between those two visual states is not decoration. It is governance.

Skills are plugins, even when they are just Markdown

The agent ecosystem has spent the last year rediscovering plugins under different names. Claude Code has skills, subagents, plugins, MCP servers, and managed configuration. Qwen Code is putting diagnostic procedures in .qwen/skills/. Zed added skills and global agent instructions. OpenCode is making diffs, errors, MCP auth, and session state more explicit. Crush is now making skill invocation visible in the picker and transcript.

Different products, same architectural drift: the coding agent is no longer a text box. It is a runtime with loadable behavior, privileged tools, memory, prompts, policies, provider state, and UI state. Once that is true, the user needs a way to understand what the runtime loaded before it touched the repo.

Skills are powerful because they let teams encode behavior outside the model. A skill can say how to debug memory leaks, how to run release verification, how to review security-sensitive changes, how to migrate a service, how to inspect CI, or how to avoid a known footgun in the codebase. That is useful operational knowledge. It is also a behavior supply chain. A poorly named skill with vague instructions can quietly change what the agent sees, what tools it prefers, and how it interprets the task.

Descriptions are the minimum viable control plane. A skill with a good description can be reviewed, searched, taught, linted, and audited. A skill without one is just a folder that mutates agent behavior. Crush is only surfacing the description today, but that is the first step toward richer policy: owner, version, allowed tools, intended scope, required environment, risk level, and whether a human can invoke it directly.

The front matter in PR #2970 is therefore more interesting than it looks. user-invocable: true is a product decision masquerading as metadata. It says not every skill is necessarily for direct human selection. Some may be internal. Some may be runtime-selected. Some may be too dangerous or too specific to expose casually. Once teams have dozens of skills, that distinction becomes essential.

The transcript needs a visual grammar for authority

Agent sessions already contain too many things that look like the same thing. A user message, a model reply, a tool call, a tool result, an MCP response, a file attachment, a skill activation, a policy injection, a system instruction, and a provider-side continuation event can all collapse into “stuff in the scrollback.” That is fine for demos and terrible for incident review.

The attachment marker is small, but it points at the right design principle: authority should be visible. If the runtime attached a skill, say so in a way the human can notice later. If the agent invoked a workflow, leave an artifact. If a capability changed the context window, do not bury it as another paragraph of chat text.

This is the same reason diff viewers, explicit permission prompts, MCP server lists, command approval logs, and token/cost breakdowns matter. Agentic coding fails in the seams. The model may be impressive, but the trust decision happens in the UI: what did it load, what did it run, what did it edit, what authority did it have, and what can I inspect before approving the patch?

Crush’s prior v0.71.0 release reinforces the point. It added user-invocable skills, tightened permission prompts for chained Bash commands involving pipes and redirection, added fallback token and cost estimation when providers omit usage data, improved Bedrock defaults, and continued experimental client-server mode behind CRUSH_CLIENT_SERVER=1. Those are not benchmark features. They are operating-surface features: permissions, economics, deployment shape, and visible context.

The nightly’s skill-picker work belongs in that family. It makes agent behavior more inspectable before and after invocation. That is how these tools become less haunted.

What teams should steal from this patch

If you use Crush, the immediate move is boring and valuable: audit your .agents/skills/ directory. Add descriptions that say what each skill does, when to use it, and when not to use it. Mark only the workflows that humans should intentionally select as user-invocable: true. Try the nightly in a disposable repo and verify that invoked skills appear as attachments in the transcript. If the description is too vague to help a teammate pick the right workflow, it is too vague for an agent runtime.

If you maintain an internal agent platform, steal the larger pattern. Every loadable behavior should have a name, description, visible activation event, and transcript artifact. Ideally it should also have ownership, versioning, review status, allowed tools, and scope. Treat skills like lightweight packages. They may be Markdown, but they carry operational intent.

Security teams should care too. Skill visibility is not a replacement for sandboxing, permissions, or code review, but it is a prerequisite for all three. You cannot govern what nobody can see. You cannot write a policy for “some text the agent silently loaded.” You can write a policy for a named skill with metadata and invocation logs.

There is also a developer-experience angle. Good skill metadata reduces choice paralysis. A picker full of names like debug, review, and fix forces humans to guess. A picker with specific descriptions teaches the workflow at selection time. That matters because agent skills are not only instructions for models; they are documentation for teams.

None of this makes Crush uniquely solved. Descriptions can lie. Attachments can be ignored. A malicious or sloppy skill can still do damage if the surrounding permission model is weak. But visible activation is the right primitive. The industry has already learned, painfully, that invisible automation feels magical until it fails. Then everyone asks for logs, provenance, and a timeline.

Charm’s nightly is a small patch with the right instinct: make the agent’s capability stack more legible. Skills are becoming plugins by another name. Plugins need metadata, visibility, and reviewable activation — not just a folder full of Markdown and a prayer that the user remembers what got loaded.

LGTM. Tiny diff, correct direction.

Sources: GitHub — Crush nightly, Crush compare v0.71.0...nightly, PR #2970, Crush v0.71.0, Crush repository