An escalation state machine with signed-link acknowledgement
A deep dive on building native on-call escalation in Laravel — the incident state machine, a per-minute scheduler that advances tiers, repeat-until-acknowledged, incident dedup, and stopping escalation with a signed URL.
PagerDuty and Opsgenie own escalation when you use them. Plenty of small teams don't, and for them a single Slack ping everyone ignores isn't on-call — it's hope. This is a deep dive into a lightweight escalation engine built right into the app.
The data model
Three tables:
escalation_policies— org-scoped (optionally project-scoped), a set ofevent_keysit covers, and arepeat_count.escalation_policy_steps— ordered tiers, each withdelay_minutesand a set of channel ids.escalation_incidents— a live run of a policy:status,current_step,repeats_done, and anext_run_attimestamp.
The incident is a small state machine: open → acknowledged | resolved | exhausted. The whole engine is just "advance open incidents whose next step is due, until something stops them."
Trigger: open an incident, fire step one
When a routed event fires, the manager opens an incident and immediately processes the first step. The dedup here is its own table, not the alert cache: don't open a second incident for a (policy, dedup_key) that already has one active or one created within the window. Otherwise a flapping alert spawns incidents endlessly.
The advance loop
A per-minute command — lookout:process-escalations — drives everything:
EscalationIncident::query()
->where('status', 'open')
->whereNotNull('next_run_at')
->where('next_run_at', '<=', now())
->each(fn ($incident) => $this->processIncident($incident));
processIncident is the heart of the state machine. Fire the next step's channels, then schedule the one after it:
$next = $incident->current_step + 1;
if ($next >= $steps->count()) { // ran out of steps
if ($incident->repeats_done < $policy->repeat_count) {
$incident->repeats_done++; // loop the whole sequence
$next = 0;
} else {
return $incident->update(['next_run_at' => null]); // done; await ack
}
}
$this->fireStep($incident, $steps[$next]);
$followsAt = $steps[$next + 1]?->delay_minutes;
$incident->update([
'current_step' => $next,
'next_run_at' => $followsAt ? now()->addMinutes($followsAt) : /* repeat or null */,
]);
Three behaviors fall out of that one function: tiered delays (each step schedules the next by its delay), repeat-until-acknowledged (loop back to step 0 while repeats_done < repeat_count), and natural termination (next_run_at = null parks the incident, still open, awaiting a human).
Stopping it: a signed URL
Every escalation message carries an acknowledge link. The trick is it needs no login — a responder on their phone at 2am shouldn't hit an auth wall. Laravel signed URLs are exactly right: the signature is the proof.
URL::signedRoute('escalations.ack', ['incident' => $incident->id]);
The route sits behind the signed middleware, so a tampered or unsigned link is rejected automatically. Acknowledging flips status to acknowledged and nulls next_run_at — the advance loop skips it forever after. There's an in-app ack on the live incident list too, but the signed link is the one that actually gets used.
The design call: don't over-build
The interesting decision was restraint. PagerDuty exists and is excellent; I'm not out to clone it. No on-call calendars, no rotations, no override schedules. Just tiers, delays, repeat, and ack — for teams whose current alternative is nothing. It reuses the existing alert channels and notifier and only adds the "keep going until someone responds" loop on top. Right-sized beats feature-complete.
One capstone feature left: getting errors off the web and onto phones with mobile SDKs.