Lookout
000 015 030 045 060 075 090 105 120 135 150 165 180 195 210 225 240 255 270 285 300 315 330 345 360
3 min read Tom Shafer Deep dive

Designing a pluggable alert engine

A deep dive on Lookout's alert engine — the event catalogue, severity mapping, channels and subscriptions, the single notifier seam, and cache-based deduplication that makes adding a new alert a one-liner.

A detector that doesn't tell anyone is a diary, not an alert. This is a deep dive into the part that tells someone — and, more importantly, into the seam that lets every future detector reuse it without ceremony.

The design constraint

I knew the back half of this sprint would ship a dozen things that need to alert: SLO burn rate, anomaly detection, error spikes, watcher thresholds, quota, billing. If each one invented its own routing, I'd have a dozen subtly-different alerting code paths to maintain. So the constraint was explicit: firing a new kind of alert should be a single function call, with zero new routing or delivery code.

The pieces

Five small, single-responsibility parts:

  • AlertEvents — the catalogue. Every alertable thing is a string key (issue.new, issue.fatal, uptime.down, slo.burn_rate, job.failed, quota.exceeded, …) with a label and description. A catalogue, not scattered string literals, so the UI and the router agree on what exists.
  • AlertSeverity — pure mapping from event key → critical | error | warning | info. One place decides "does this page someone or just log?" A fatal crash is critical; a slow-query warning isn't.
  • AlertChannel — a reusable, org-scoped destination with a type (slack, discord, teams, telegram, sms, webhook, email, and later pagerduty/opsgenie) and a JSON config.
  • AlertSubscription — wires (project|org, event_key) → channel. This is the matrix the user edits: send issue.fatal to #incidents.
  • AlertDelivery — an append-only log of everything sent, powering the alert-history view.

The one seam everything funnels through

Every alert in the system goes through one function:

OrgAlertNotifier::send(
    $project,
    $eventKey,     // e.g. 'slo.burn_rate'
    $subject,
    $body,
    $dedupKey,     // e.g. "slo:{$slo->id}"
    $dedupTtl,
);

Internally it: resolves subscriptions for (project, eventKey), expands email recipients (honoring opt-outs and quiet hours), checks the dedup gate, sends to every matched channel via a dispatcher, and logs each delivery. A scheduled evaluator computes threshold events on a timer; immediate notifiers (new issue, fatal) call the same send() inline. Two trigger styles, one routing path.

Dedup is the unsung hero

The feature nobody requests and everybody needs. An error spiking 10,000×/minute must produce one alert, not 10,000. The implementation is a single cache primitive:

if (! Cache::add('alert_dedup:'.sha1($eventKey.'|'.$dedupKey), true, $dedupTtl)) {
    return false; // already alerted within the window
}

Cache::add is atomic — it only succeeds if the key doesn't exist — so the first trigger in the window wins and the rest no-op. The caller picks the dedup key (per fingerprint, per project, per SLO) and the TTL (the quiet window). Simple, race-free, and entirely the caller's policy.

The payoff: adding a detector is a one-liner

Because the seam is clean, every later detector this sprint — SLO burn rate, anomaly detection, incidents — added zero lines to this engine. They each compute their condition and call send() with a new event key. Even the PagerDuty and Opsgenie channels and the native escalation layer plugged into the same surface.

That's the whole difference between an alert feature and an alert engine: the feature solves today's alert; the engine makes every future alert free.

Next: a run of new watchers and dashboard sections, starting with the Security section.

deep-dive alerting architecture laravel