claude-code

Claude Replay Makes Agent Session Recovery Boring, Which Is Exactly the Point

Anatoliy Kolodkin

31 May 2026 • 5 min read

The least glamorous Claude Code feature request is also one of the most important: when an agent session dies, can you reconstruct what happened without reading a terminal scrollback like an accident report?

Claude Replay is a new Python tool that answers that question with the right kind of boring architecture. It passively records Claude Code sessions through hooks, stores events and checkpoints in local SQLite, and exposes recovery through CLI commands, MCP tools, a dashboard, a TUI, and self-contained HTML exports. The repository was created on May 31, shipped an initial 0.1.0 commit, and had no stars, forks, or issues at research time. That makes it early. It also makes the signal cleaner: this is not hype. It is infrastructure.

The README describes a simple flow. Claude Code hooks write to ~/.claude-replay/sessions.db. Readers — the CLI, dashboard, TUI, and MCP server — read from SQLite. Hook installation merges PreToolUse, PostToolUse, and Stop entries into ~/.claude/settings.json. The local server defaults to 127.0.0.1:8766, with a dashboard at /, an MCP SSE endpoint at /sse, JSON state at /api/state, and a health check at /status.

That is not an attempt to make the agent smarter. It is an attempt to make the last hour of agent work recoverable. Good. The ecosystem has plenty of tools trying to improve the next action. It needs more tools that preserve evidence about the previous 200.

Recovery is a runtime feature, not a chat UX nicety

Long-running agent sessions fail in boring ways: API errors, rate limits, terminal crashes, laptop sleep, context overflow, a bad tool call, a network hiccup, or the user realizing the session has wandered and hitting escape. When that happens, the problem is rarely “the model cannot continue.” The problem is state. What task was underway? Which files changed? Which commands ran? What did the agent believe it had already verified? Where was the last safe checkpoint? What should the next session avoid repeating?

Claude Replay’s MCP tools are named around that workflow: replay_status, replay_checkpoint, replay_resume, replay_sessions, and replay_export. The CLI adds commands including install, uninstall, status, sessions, resume, export, serve, tui, and reset. The dashboard polls every two seconds and offers a session list, timeline, copy-resume, and export actions. The optional TUI adds a session sidebar, live event feed, and detail inspector.

The most important technical detail is the restraint. The hook path is described as offline-first, making no network calls and completing in well under 50 ms with a single SQLite write. Large tool payloads are truncated at 8 KB per event. This is exactly how a recorder should behave. If the recording path starts doing network calls, summarization, remote writes, or complex processing, it becomes another failure mode inside the session it is meant to protect.

Passive hooks writing local state is the right default. It separates capture from interpretation. The agent can fail, the dashboard can crash, and the local database can still contain enough trace to resume or audit. That separation is an underrated design principle in agent tooling: the thing that records the session should be smaller and dumber than the thing being recorded.

A resume brief is evidence, not an instruction stream

The MCP integration is clever because it lets a future Claude Code session query prior session state directly. That turns replay from an external dashboard into a recovery tool the agent can use. Done well, the agent can ask what happened, generate a resume brief, inspect recent checkpoints, and continue without the user hand-stitching context from memory.

But replay data should be treated as evidence, not authority. Previous tool outputs can be stale. A recorded command may have failed after truncation. A prior assistant message may contain a mistaken conclusion. A file may have changed since the checkpoint. If replay content is injected into the current model context and treated as unquestioned instruction, the recovery tool becomes a way to preserve old mistakes with more confidence.

The safe pattern is explicit: a resume brief should point to facts and files, not dictate actions. It should say which files were touched, what commands ran, what tests passed or failed, and what checkpoint was created. Then the current session should verify the working tree and source before editing. If an agent resumes from trace data without checking the repo, it is not recovering state; it is roleplaying continuity.

The 8 KB truncation policy is a practical tradeoff. Without limits, session databases will balloon quickly, especially when tools read large files or dump logs. With limits, forensic completeness is reduced. That is acceptable for many solo workflows, but teams using replay-style tooling for audit should decide whether they need richer telemetry elsewhere: model IDs, token usage, approval decisions, command arguments, MCP server names, sandbox identity, external side effects, and full artifact hashes. Claude Replay is closer to a session black box than a compliance flight recorder.

Local-only is the right privacy default, but it still needs policy

Session traces are sensitive. They can contain source code, file paths, shell commands, issue IDs, internal service URLs, environment names, snippets from logs, and accidental secrets. Claude Replay keeps the database under ~/.claude-replay/sessions.db and serves locally by default. That is the right posture. It avoids a SaaS trust boundary for a tool whose whole job is to remember what the agent saw and did.

Local does not mean harmless. Developers should know where the database lives, how to rotate or delete it, whether exports contain sensitive content, and whether binding the server outside loopback is ever allowed. For most environments, --host 0.0.0.0 should be treated as a footgun unless there is an explicit access-control story. An HTML export of an agent session can be as sensitive as a bug report, a build log, or a code review. Sometimes more.

The project also hints at a larger shift in Claude Code practice. Hooks are no longer just automation glue. They are becoming the place where teams attach observability, recovery, policy, and state. That is healthy, provided the hook layer stays reviewable. Every hook that writes session data, calls a server, mutates settings, or exposes an MCP endpoint should be part of the same review path as developer tooling and CI scripts.

For practitioners, the action item is simple: if you let agents run long tasks, add checkpointing before you need it. Do not wait until a half-finished refactor disappears behind an API error. Generate resume briefs automatically. Keep local traces. Export evidence when an agent-authored diff needs review. Pair replay with tests and git status checks, not with blind trust.

The editorial take: Claude Replay is the right kind of boring. If an agent can spend an hour editing your repo, losing the session state should be treated like losing a build artifact, not a charming quirk of chat UIs. Recovery is part of the runtime now. Ship it before the terminal eats your afternoon.

Sources: Claude Replay README, Claude Code hooks reference, Claude Code MCP docs, Claude Replay CI run

Recovery is a runtime feature, not a chat UX nicety

A resume brief is evidence, not an instruction stream

Local-only is the right privacy default, but it still needs policy

Sign up for more like this.