Argus Gives Claude Code the Observability Layer Every Production Tool Needs: It Knows What the Agent Did, When, and What It Cost
Claude Code runs on your machine, touches your files, executes shell commands, and costs money per token. Most developers still have no idea what their agent session actually did between "start" and "done" except for what they can remember from the scrollback. That was acceptable when AI coding assistance was an experiment. It is not acceptable when agentic workflows are becoming the primary way software gets built in a growing number of organizations. The gap between "agent is running" and "someone understands what happened" is a blind spot that has real operational and security consequences.
Argus, a same-day observability dashboard for Claude Code, is the first tool in the evening's crop that treats this gap as a product problem worth solving directly. It uses Claude Code's existing hooks — PreToolUse, PostToolUse, and Stop — to capture every tool call, subagent event, and cost estimate into a SQLite database, then surfaces the data in a React dashboard with session traces, flag rules, and a security alert feed. The key insight is not that Claude Code lacks hooks. It has them. The insight is that nobody has wired them to persistent storage and a readable UI. Argus does that, treating agent sessions the way a good APM treats production services: you can see what happened, when, and what it cost, and you get alerted on suspicious patterns before they become incidents.
What the data model actually captures
The backend is FastAPI on port 7777 with SQLite via SQLModel for storage. The frontend is React, Tailwind, and Vite with Recharts for visualization on port 3000. The data model is where the design gets serious. A Session record tracks the parent_session_id for subagent hierarchy — important because Claude Code's agent tool can spawn subagents that spawn more subagents, and without parent tracking you get a flat log that obscures the actual call graph. An Event record captures tool_call, tool_result, subagent_spawn, compaction, and error events, with per-event input_tokens, output_tokens, cost_usd, and duration_ms.
That granularity is what makes post-incident review actually possible. When something goes wrong in an agent session, the questions you need to answer are: which subagent started this chain? Which one hit an error? Which path consumed most of the budget? Without parent_session_id tracking and per-event cost attribution, those questions are unanswerable from the session log. With them, they are standard queries.
The flag rules are the security-relevant layer. They catch dangerous patterns with severity levels: sudo, rm -rf, curl | bash, and chmod 777 trigger critical severity. Writes outside the project directory trigger a warning. A subagent with no parent session triggers an info-level alert. A single event costing more than $0.10 or a session totaling more than $1.00 triggers budget warnings. These are not hypothetical risks — they are documented failure modes from real agentic workflows that have gone sideways. Having a persistent, queryable log of every tool call with cost attribution means you can catch problems during a session, not just after.
Why local-first is the right call for this niche
Most existing LLM observability tools assume cloud API calls. You send your completion data to a vendor's service and get a dashboard. That model works fine when the security surface is "I sent a prompt and got a completion." It breaks down entirely when the agent has file system access, shell execution privileges, and can modify code in your repository. The security surface is highest in exactly the scenarios where third-party observability services are least appropriate.
Argus's local-first design sidesteps this entirely. No cloud upload. No external service to trust with your session data. SQLite on the developer's machine. The vendor is nobody. This is exactly the right approach for the agentic-coding observability niche, and it is surprising it took this long for someone to build it. The fact that it is MIT licensed and runs without any external dependencies is a statement of intent: this is infrastructure you own, not a SaaS product with a free tier.
What this means for teams running agentic workflows
If you are running Claude Code in a professional context — real repositories, real consequences, real budget — you need to be running Argus or something like it alongside it. The cost tracking alone is worth it. Token costs accumulate fast in long agent sessions, and without per-event attribution you have no idea which operations are expensive and which are cheap. The security alerts are the other critical layer. An agent that runs curl | bash from an untrusted source is a real risk. Having that flagged immediately rather than discovered after the fact is the difference between a near-miss and an incident.
The setup requires registering hooks in ~/.claude/settings.json — the exact snippet is in the repo's CLAUDE.md. It is a one-time configuration that delivers ongoing visibility. For teams that have been running agentic workflows without any observability, this is the upgrade that makes the practice sustainable.
The broader takeaway is that agentic coding tooling is graduating from experiments into infrastructure. When a practice becomes load-bearing in a team's development process, it gets the operational discipline that load-bearing systems deserve: monitoring, alerting, audit trails, and incident review capability. Argus is the first wave of tooling that treats agent sessions like production workloads rather than research experiments. That transition is happening now, and the teams that get ahead of it will be the ones that catch problems before they become incidents rather than the ones that discover them in post-mortems.
Sources: GitHub / Argus, Claude Code, Anthropic Engineering / Managed Agents