Grok Build’s Memory and Command Surface Turns the CLI Into a Stateful Agent Runtime

Persistent memory is where coding agents stop feeling like clever terminals and start behaving like stateful runtimes. That is useful. It is also where old assumptions, stale decisions, local hacks, and occasionally secrets can acquire a longer half-life than anyone intended. xAI’s refreshed Grok Build documentation is interesting not because it adds another slash command, but because it exposes memory, compaction, usage, session control, and automation as explicit surfaces developers have to operate.

Low-authority recaps have framed this as a /remember story. The official docs are more consequential than that. Grok Build now documents a command surface that includes /memory, /flush, /dream, /compact, /context, /usage, session resume/fork controls, plan mode, always-approve mode, headless JSON output, and ACP integration through grok agent stdio. That is not just UX polish. It is a runtime contract: what state exists, how it is inspected, when it is written, how it is compressed, and how much it costs.

The basics are straightforward. Grok Build’s terminal UI supports mode cycling with Shift+Tab. Plan mode blocks write tools except the session plan file and can stop to ask clarifying questions before edits. Always-approve skips permission prompts and can be started with grok --always-approve or toggled through /always-approve. Default permission behavior can be set in ~/.grok/config.toml under [ui] permission_mode = "ask" or "always-approve", with ask as the default. Legacy keys such as approval_mode and yolo = true still work, but permission_mode wins.

Memory is context, not policy

The most important practitioner point is that agent memory should be treated as operational state, not magic. A stateless assistant forgetting your repo conventions wastes time. A stateful assistant remembering obsolete architecture, deprecated commands, accidental credentials, or one developer’s local workaround can actively mislead future work. xAI’s shell-provided commands make this explicit: /flush writes conversation memory to disk, /memory searches and edits persistent memory entries, and /dream triggers offline memory consolidation. Those are useful controls because they acknowledge that memory has a lifecycle.

Teams should build hygiene around that lifecycle. Decide which instructions belong in versioned repo files, which belong in user-local memory, and which should never be stored. Ban secrets from memory. Review memory entries after major corrections, before important edits, and when switching projects. Keep local session history out of source control. For shared automation, prefer explicit configuration and narrow allow rules over accumulated memory. The agent remembering “how we do things” is valuable only if humans can inspect and correct what it thinks “we” means.

Claude Code is the obvious comparison point. Anthropic documents persistent memory through project/user-authored CLAUDE.md files and auto memory, while warning that memory is context rather than enforced configuration. That caveat matters across every coding agent. A memory entry can steer behavior; it should not be confused with policy. If the company requires a sandbox, a deny rule, a model restriction, or an approval gate, put that in managed configuration, not in a friendly note the model may summarize away later.

Compaction is lossy infrastructure

The command set also highlights the relationship between memory, context pressure, and cost. Grok exposes /context to inspect what is in play, /compact and /compact-mode to manage long sessions, and /usage to report consumption. That trio belongs in the normal operating loop for serious agent work. Long-running coding agents do not answer once; they gather context, call tools, edit files, compress history, resume later, and carry summaries forward. Every compression step can drop nuance. Every retained instruction can bias the next turn. Every additional chunk of context has a cost.

That is why compaction should not be treated as a harmless cleanup button. It is a lossy transformation of working state. Before compacting a long debugging or refactoring session, developers should know which constraints must survive: failing test names, architectural limits, security assumptions, migration requirements, ownership boundaries, and decisions already rejected. After compaction, sanity-check that the agent still remembers the important parts. If the summary quietly lost “do not touch the billing path,” the next diff may be technically impressive and organizationally radioactive.

Usage visibility matters for the same reason. A coding agent with a 256K-context model and tool access can burn tokens without dramatic UI signals. A session that feels idle may have accumulated repo context, tool schemas, previous turns, image/text inputs, or long generated patches. /usage turns that into something developers can see. The actionable habit is simple: check context before long work, check usage after it, and correlate big jumps with prompts, tools, and compaction events. If an agent workflow cannot explain its spend, it is not ready to become infrastructure.

Approval mode is an architecture decision

The permission model deserves the least casual reading. Plan mode is a strong default for unfamiliar repos because it blocks writes and forces the agent to articulate intent before changing files. Always-approve is powerful for constrained automation and dangerous in broad workspaces. xAI’s enterprise docs sharpen the distinction with additional modes for headless and CI-style use, including dontAsk, which silently denies anything without explicit allow rules, and acceptEdits, which auto-approves file edits while still prompting for shell commands.

The docs also name the kinds of operations teams should think about. Read-only actions such as reading files, listing directories, grepping, web search, writing todos, and selected safe shell commands like git status, git diff, cargo check, and kubectl get/logs/describe are treated differently from dangerous commands. Operations such as rm, chmod, chown, kill, pkill, killall, and git push prompt in ask mode but can be auto-approved in always-approve unless explicit deny rules block them. That last clause is the one platform teams should underline.

Headless mode expands the blast radius. Grok can run with grok -p "Your prompt here", choose models, use named session IDs, resume or continue sessions, set working directories, emit plain, final json, or newline-delimited streaming-json, and disable auto-update. ACP mode exposes Grok as a JSON-RPC agent over stdin/stdout through grok agent stdio, with documented initialize, authenticate, session/new, and session/prompt flows. Once an agent can be embedded in scripts, editors, bots, and remote workers, approval mode is no longer a personal preference. It is part of the system design.

For practitioners, the checklist is deliberately boring. Start in ask or plan mode for unfamiliar repositories. Use /context before long tasks and /usage after them. Inspect /memory after corrections and before high-stakes edits. Treat /dream and /flush as writes to operational state. Use /compact intentionally, then verify what survived. Avoid always-approve anywhere secrets, deployment scripts, broad shell access, or write-capable MCP tools are present unless sandbox profiles and deny rules are already pinned.

Grok Build is now much easier to compare with Claude Code and Codex because the conversation is no longer just model quality. It is memory, session state, command ergonomics, automation interfaces, permission modes, MCP/tool behavior, and cost visibility. That is the real coding-agent operating surface. xAI is documenting more of it, which is good. Now teams need to use those controls like controls, not decorations.

Sources: xAI Docs, xAI Headless & Scripting docs, xAI Enterprise Deployments docs, Claude Code memory docs, OpenAI Codex MCP docs