OpenClaw’s Corrupted-Header Bug Is a Small Parser Mistake With Full Transcript Data Loss
The most dangerous data-loss bugs rarely announce themselves with drama. They hide inside reasonable parser assumptions. OpenClaw issue #89037 is exactly that kind of bug: if the first JSONL line in a session file — the header — is corrupted or partially written, OpenClaw can skip it, decide the remaining valid transcript is not a valid session, and rewrite the file into a fresh empty session. A malformed first line becomes full transcript loss.
That should make every agent operator uncomfortable. Session files are not disposable cache in an agent runtime. They are task history, approval history, tool-call evidence, debugging context, and sometimes the only usable record of how an autonomous workflow got from request to result. If the runtime can still parse user and assistant rows after a damaged header, deleting them is not cleanup. It is evidence destruction with good intentions.
A one-line corruption turns into a rewrite
The reported path is specific. The audited version was main @ 0b5be66e, and the affected flows include openclaw --session <file>, --continue, and resume paths through SessionManager.open() and setSessionFile(). The loader function, loadEntriesFromFile(), skips JSONL lines it cannot parse. That is often sensible for append-only logs, because partial writes happen. The problem comes next: after skipping malformed lines, it validates only the first successfully parsed entry as the session header.
If the corrupted line was the header, the first successfully parsed entry is likely a normal message row. That row fails the header check. The function returns []. Then setSessionFile() cannot distinguish “this file was truly empty” from “this non-empty transcript has a corrupt header and valid messages.” It calls newSession() and rewriteFile(). The result is a fresh header and a lost transcript.
ClawSweeper source-reproduced the issue: a non-empty JSONL file with a malformed first line and valid later message rows caused the loader to return an empty entry list, after which the resume path rewrote the file. That is the key detail. This is not a theoretical complaint about parser style. It is a concrete recovery path where recoverable data is converted into unrecoverable absence.
JSONL recovery should be salvage-first
The bug is a reminder that append-only formats need different recovery instincts than structured all-or-nothing documents. If a JSON file is malformed at byte 10, perhaps the whole document is suspect. JSONL is different by design. One malformed line can coexist with hundreds of valid later lines. The runtime should treat that as partial corruption, not empty state.
That distinction matters more for agents than for ordinary chat logs. A session transcript may include tool outputs, approval decisions, generated artifacts, summaries, compaction boundaries, and channel delivery state. Losing those rows can force an agent to rerun work, duplicate tool calls, forget constraints, or lose the audit trail needed to understand a bad action. In a code agent, that can mean losing the record of why files changed. In an operations agent, it can mean losing the record of which commands were approved.
PR #89065 proposes the right shape of fix. Instead of assuming the header must be the first parsed entry, loadEntriesFromFile() locates a valid header with findIndex(). If messages exist without a valid header, setSessionFile() synthesizes a replacement header rather than truncating the transcript. The PR reportedly adds nine tests across two Vitest projects — 18 passing test cases across repeated project runs — covering truncated headers, missing IDs, header-not-at-index-zero cases, nonexistent files, empty files, corrupt-header recovery, and normal behavior.
That separates identity from history, which is exactly the right model. A missing or corrupt header is an identity problem. Valid message rows are history. If you can preserve history and repair identity, do that. If you cannot repair safely, make a backup and stop. The one thing recovery code should not do is quietly turn “partially damaged” into “empty.”
The missing pieces: backup and visible repair
The proposed fix still leaves room for stronger operational behavior. The research notes call out two omissions: no pre-rewrite .corrupt-<timestamp>.jsonl backup, and no visible warning when a header is repaired. Silent repair is better than silent deletion, but visible repair is better than both.
Why? Because recovery is not only about preserving data; it is about preserving operator trust. If an agent runtime synthesizes a header, the user should know that happened. If a file had a corrupt first line, the old bytes should be preserved before any rewrite. That gives maintainers a forensic trail and gives operators a rollback path if the synthetic identity is wrong. In systems that perform semi-autonomous work, recovery metadata is not noise. It is how you avoid lying about what the system knows.
There is also a product lesson here for anyone building agent infrastructure. “Resume session” is not a convenience feature once agents run real work. It is a durability contract. The runtime must define what happens after crashes, disk-full events, partial writes, interrupted compaction, interrupted tool calls, and upgrades that change header formats. If those cases are undefined, users will discover the policy at the worst possible time: after the only copy of the transcript is gone.
For practitioners, the immediate advice is simple. If OpenClaw reports session weirdness after a crash, upgrade, or disk issue, copy the session .jsonl before repeatedly opening it. Treat a suddenly blank session as a symptom, not proof that the old history was unrecoverable. If you operate agents for a team, add session-file backups to your runbook and keep an eye on fixes related to #89037, #89065, and follow-on recovery changes.
The broader take is sharper: agent session files are evidence, not cache. A runtime that can parse valid messages should preserve them until a human explicitly discards them. Anything else makes the system look clean by deleting the dirt, which is the wrong kind of reliability.
Sources: OpenClaw issue #89037, OpenClaw PR #89065, OpenClaw PR #89085, OpenClaw v2026.6.1-beta.1 release