June 16, 2026 2 min read Tom Shafer

The Incidents page: spikes, releases, and correlation

Detecting error spikes and correlating them with release markers and affected traces, so an incident reads as a story instead of a list.

Next: an Incidents page in the Errors section.

A raw error list answers "what's broken." An incident view answers the better question: "what happened?" Those are different. The first is a feed; the second is a narrative with a beginning (a spike), a probable cause (a release), and a blast radius (the traces and queries it touched).

What it does

Spike detection — find the moments when error volume jumped, not just the steady background hum. A spike is the start of a story.
Release correlation — line spikes up against the release markers. When a spike begins minutes after 2.4.0 ships, the page puts those two facts next to each other and lets you draw the obvious conclusion.
Affected traces & slow queries — from a spike, jump straight to the requests and queries caught in it. Detection to diagnosis in one hop.

Why correlation beats detection

Anyone can detect a spike — it's a count crossing a line. The value is in the correlation: "errors spiked at 09:12, and 2.4.0 deployed at 09:07." That single juxtaposition turns a frantic "everything's on fire, where do I even look" into "roll back 2.4.0 and breathe." The page exists to manufacture that juxtaposition automatically.

It only works because the spine was already in place — releases are first-class, and traces are linkable. Each feature this sprint makes the next one cheaper.

Next: putting numbers on reliability with SLOs and error budgets.

build-in-public incidents releases