Building a service dependency map from trace spans
A deep dive on synthesizing a service/dependency graph from existing trace spans — extracting destinations per span op, portable JSON extraction across SQLite and MySQL, and collapsing spans into nodes and edges.
A deep dive into a feature that shipped without a single new line of instrumentation: the service dependency map.
The realization: the graph is already in the data
A service map sounds like a big project — agents, a topology collector, a graph store. It wasn't, because the topology is already encoded in the trace spans I collect. Every outbound dependency a request touches is already a span, and every such span already carries the address of what it called:
- an HTTP-client span (
op = http.client) hasserver.address - a Redis span (
op = db.redis) has its connection name - a database span has its
db.system - a cache span has the store
So building the map is a read problem, not a collection problem. Walk the spans in the window, turn each into an edge this app → that dependency, and collapse duplicates into weighted nodes and edges.
Mapping a span to a node
The first job is normalizing heterogeneous spans into a typed destination. It's a switch on op:
$node = match (true) {
$op === 'http.client' => ['type' => 'http', 'name' => $data['server.address'] ?? 'unknown'],
$op === 'db.redis' => ['type' => 'redis', 'name' => $data['db.redis.connection'] ?? 'redis'],
str_starts_with($op, 'db.')=> ['type' => 'db', 'name' => $data['db.system'] ?? 'database'],
str_starts_with($op,'cache.')=>['type'=> 'cache', 'name' => 'cache'],
default => null, // not an outbound dependency; skip
};
Each non-null result is an edge from the application node to that dependency. Then you aggregate: group by (type, name), count calls, average duration, track error rate. Nodes get a weight (call volume), edges get a health (error rate, latency), and now you have a graph you can lay out.
The portability detail: reading JSON the same way everywhere
Those destinations live in a JSON data column, and Lookout runs on both SQLite (tests) and MySQL (production). The portable way to read a flat dotted key out of JSON in both engines:
json_extract(data, '$."server.address"')
The quoting matters: the key is the literal string server.address (with a dot in it), so it must be quoted as one path segment — '$."server.address"' — not '$.server.address', which both engines would read as nested server → address. Get that wrong and every lookup silently returns null.
Why "read, don't collect" keeps winning
This is the second feature this sprint (rate-limit tracking was another) that shipped value with zero new instrumentation — just a smarter question asked of spans already flowing in. That's the compounding dividend of getting the trace data model rich enough early: the more context each span carries, the more features you can synthesize later without ever shipping an SDK change. Collect well once; harvest forever.
Next: tying errors to the releases that caused them — the Incidents page.