Four Incidents in One Day: What Claude's April 28 Outage Pattern Actually Tells Us
April 28 was not a good day to be running Claude in production. Looking at the status page history, the service experienced four separate incidents within a single UTC day — a pattern that stands out even in a week where Claude's reliability has been inconsistent. Let me lay out what actually happened, because the timeline tells a more interesting story than any single incident headline.
The day started early. At 11:53 UTC, Haiku 4.5 hit an elevated error rate that lasted about 51 minutes and was resolved by 12:44 UTC. Thirty-five minutes after that resolution, at 13:22 UTC, a second incident began — elevated errors that Claude's status page describes as affecting the same range of services: Claude.ai, the API, and login paths for Claude Code. That one lasted 17 minutes and was resolved by 13:39 UTC. By 17:34 UTC, a third and more serious incident was underway — the one that generated the news coverage. This one was open-ended at the time, affecting Claude.ai access and authentication across the API and Claude Code, and it was not resolved until between 18:52 and 19:15 UTC. The status page shows a fourth incident at 12:24 UTC — "Claude Code Code Review intermittently wasn't starting sessions" — that was resolved in 11 minutes. All four incidents within roughly a 14-hour window.
What makes this pattern worth writing about is not any single outage. Seventeen-minute blips happen to every infrastructure provider. What matters is the distribution: Haiku errors, a brief API/auth blip, a session-management issue, and then a significant multi-hour incident affecting the full surface area of Claude's services. That is not one root cause propagating through a system. That is a service under enough load or architectural stress that multiple failure modes are firing independently in the same day.
The context that makes this pattern more notable than it would be in isolation: the past several weeks have included the April 25 Claude Opus 4.7 incident (affecting that specific model for about 35 minutes), the April 23 postmortem that Anthropic published acknowledging three separate product-layer causes of a quality regression, and multiple smaller incidents in late April. This is a company whose infrastructure is under measurable pressure. Whether that pressure comes from rapid user growth, architectural decisions made before scale was a priority, or something else entirely — Anthropic has not said. But the pattern is visible in the public incident history.
For practitioners, the practical questions are the same ones that come up every time a service goes down: did active Claude Code sessions keep working, and did the auth failures affect CI/CD pipelines? The status timeline suggests that during the 17:34–18:52 UTC incident, authentication was the primary failure mode. That means Claude Code sessions that were already running likely continued; new sessions and API token refreshes failed. For developers running automated agents through the API, the impact was different than for interactive users on claude.ai. Understanding which failure mode applies to your setup matters more than the headline "Claude is down."
What is also becoming visible is Anthropic's communication posture during incidents. The April 23 postmortem was unusually transparent — it named three specific product-layer changes as the cause of a quality regression and committed to fixes. The incident communications on April 28 followed the standard status-page format: investigating, identified, monitoring, resolved. There is no public postmortem yet for the April 28 pattern. Whether that transparency extends to operational incidents the way it did to the quality regression will be worth watching.
The broader signal is that Claude's infrastructure is being stress-tested in ways that its reliability story did not anticipate six months ago. The service has grown fast — Opus 4.7, Haiku 4.5, the Managed Agents launch, expanded API rate limits, and the steady expansion of Claude Code's enterprise footprint all add load and complexity. The incident history for the last two weeks suggests that Anthropic is managing that growth, but not always cleanly. For teams building on Claude as a primary tool, the practical takeaway is straightforward: treat Claude's services with the same availability skepticism you would apply to any critical infrastructure that is visibly still scaling. Pin versions, build retry logic, monitor your own error rates, and do not assume that a service operating normally at 9 AM will be equally healthy at 5 PM under a different load profile.
The April 28 pattern is probably not a new normal — infrastructure teams do fix these things — but it is a reminder that Claude is no longer a small, predictable service. It is a high-traffic platform that is still learning what its own limits are.