agentic-coding

MiMo Code Is an OpenCode Fork With the Feature Agents Keep Faking: Memory

Anatoliy Kolodkin

11 Jun 2026 • 4 min read

MiMo Code is not interesting because Xiaomi shipped yet another terminal agent. The internet has enough of those. It is interesting because Xiaomi’s MiMo team looked at the failure mode every serious coding-agent user has hit — the agent forgets, drifts, compresses away the important part, then confidently declares victory — and built the product around that wound.

The first public release, v0.1.0, landed on June 10 as an MIT-licensed fork of OpenCode. That lineage matters. OpenCode already gives MiMo a credible chassis: terminal UI, provider flexibility, LSP support, MCP integration, plugins, and the basic loop of reading files, editing code, running commands, and managing Git. Xiaomi’s bet is that the next real advantage is not a shinier chat box. It is persistent state.

The release ships broadly enough to be taken seriously for a first tag: macOS builds for Apple Silicon and Intel, Linux builds for x64 and Arm including musl variants, and Windows builds for x64 and Arm. At research time the brand-new repo already had roughly 3,155 stars, 229 forks, 199 open issues, and 48 reactions on the release. That is launch-day heat, not proof of maturity. But it is enough signal to justify a hard look.

The agent runtime is the product

Xiaomi’s technical post frames the problem correctly: coding agents are stateless model calls trapped inside stateful work. A model invocation does not remember your migration plan, your weird test fixture, or why one file must not be touched unless the runtime preserves that information and re-injects it at the right moment. Most products paper over that gap with conversation summaries. Summaries are useful until they become lossy compression of the one fact that mattered.

MiMo Code attacks this with a layered memory system: project memory in MEMORY.md, session checkpoints in checkpoint.md, scratch notes in notes.md, per-task progress logs, and SQLite FTS5 search over durable state. That is not glamorous. It is the architecture a working agent needs if it is expected to survive multi-hour work rather than one-shot patch generation.

The checkpoint schedule is the detail that stands out. Xiaomi says checkpoint extraction happens around 20%, 45%, and 70% of the configured context budget, with a rebuild near the true limit and a reconstructed prompt capped around 65K tokens. That is exactly the kind of boring runtime engineering that separates a demo from a tool. Waiting until the context window is almost full to summarize is like writing the incident report while the incident is still taking down the database. By then the model has less room to reason, middle-context facts are already fragile, and the summary becomes triage instead of state management.

The checkpoint writer records 11 fields, including current intent, next action, constraints, task tree, involved files, design decisions, runtime state, errors and fixes, and cross-task discoveries. That list is notable because it treats agent memory as operational state, not a diary. A developer resuming a task does not need “we worked on auth.” They need the exact constraint that made the previous approach invalid.

Memory without auditability is just nondeterminism

MiMo’s choice to use readable Markdown files is more important than it looks. Agent memory is only safe if it can be inspected, edited, and deleted. Hidden embeddings can retrieve useful context, but they also create a second invisible program influencing the agent. If a coding agent misremembers a security exception, a deployment convention, or a product rule, developers need a way to correct the record without reverse-engineering a vector store.

This is where teams should be cautious and practical. Try MiMo Code in a disposable repo first. Inspect the generated memory files. Look at what gets written into MEMORY.md. Check whether secrets, tokens, customer names, or internal URLs leak into durable state. Then test resumption: stop a task halfway through, restart, and see whether the agent reconstructs the work accurately or just sounds confident.

The second major idea is stop-condition verification. MiMo’s Goal mechanism launches an independent model check whenever the agent tries to stop. Xiaomi says infinite-loop probability is below 0.5%, with false blocking more common than false passing. That tradeoff is the right one. False blocking wastes time. False passing ships incomplete work.

Any engineer who has used agents for nontrivial tasks knows the failure: the agent says tests pass after running only the convenient subset, claims a refactor is complete while ignoring generated clients, or updates implementation without migrating callers. A separate “are we actually done?” judge will not solve correctness, but it changes the default failure mode from confident abandonment to reviewable friction.

Five candidates is a cost decision, not a magic button

MiMo’s Max Mode samples five candidate solutions by default, judges them before execution, and Xiaomi claims a 10–20% SWE-Bench Pro improvement over single sampling at roughly 4–5x token cost. That is a useful knob, not a default religion. Spend it on architecture changes, production migrations, flaky bug reproduction, or high-risk refactors. Do not spend it on renaming a variable.

The broader pattern is test-time compute for coding agents. Instead of trusting the first plan a model emits, the runtime can generate alternatives, compare them, and execute the best one. That is sensible. It also makes agent economics more explicit. If your team turns on multi-sampling everywhere, your bill becomes a governance problem disguised as productivity.

The Compose workflow direction may be the most important long-term piece. Xiaomi argues that natural-language skills are too ambiguous for large workflows and that orchestration should become executable logic: JavaScript scripts in a sandbox dispatching subagents through primitives like agent(), parallel(), and pipeline(). That is where the market is going. Prompts are fine for intent. Control flow belongs in code.

There are caveats. The repo is one day old. The issue count is already nontrivial. The public benchmark claims, including MiMo Code plus MiMo-V2.5-Pro outperforming Claude Code plus Claude Sonnet 4.6 across evaluations, need raw tables before they deserve strong belief. First releases always omit the messy parts, even when the builders are honest.

Still, MiMo Code is the cleanest public articulation this week of the agent-runtime thesis: models matter, but memory, checkpoints, verification, permissions, and workflow orchestration decide whether agents can do real engineering work. Xiaomi did not invent that thesis. It did ship a concrete, inspectable implementation built on a serious open-source base.

The practical take: do not rip out your current workflow for MiMo Code today. Do clone it, run it against a real but noncritical task, and study the memory model. If it works, the lesson is bigger than Xiaomi. The next coding-agent race will be won less by who has the loudest model announcement and more by who remembers what the agent was supposed to be doing.

Sources: XiaomiMiMo/MiMo-Code v0.1.0 release, Xiaomi MiMo technical blog, MiMo Code repository, OpenCode repository

The agent runtime is the product

Memory without auditability is just nondeterminism

Five candidates is a cost decision, not a magic button

Sign up for more like this.