Lookout
000 015 030 045 060 075 090 105 120 135 150 165 180 195 210 225 240 255 270 285 300 315 330 345 360
3 min read Tom Shafer Deep dive

Building a service dependency map from trace spans

A deep dive on synthesizing a service/dependency graph from existing trace spans — extracting destinations per span op, portable JSON extraction across SQLite and MySQL, and collapsing spans into nodes and edges.

A deep dive into a feature that shipped without a single new line of instrumentation: the service dependency map.

The realization: the graph is already in the data

A service map sounds like a big project — agents, a topology collector, a graph store. It wasn't, because the topology is already encoded in the trace spans I collect. Every outbound dependency a request touches is already a span, and every such span already carries the address of what it called:

  • an HTTP-client span (op = http.client) has server.address
  • a Redis span (op = db.redis) has its connection name
  • a database span has its db.system
  • a cache span has the store

So building the map is a read problem, not a collection problem. Walk the spans in the window, turn each into an edge this app → that dependency, and collapse duplicates into weighted nodes and edges.

Mapping a span to a node

The first job is normalizing heterogeneous spans into a typed destination. It's a switch on op:

$node = match (true) {
    $op === 'http.client'      => ['type' => 'http',  'name' => $data['server.address'] ?? 'unknown'],
    $op === 'db.redis'         => ['type' => 'redis', 'name' => $data['db.redis.connection'] ?? 'redis'],
    str_starts_with($op, 'db.')=> ['type' => 'db',    'name' => $data['db.system'] ?? 'database'],
    str_starts_with($op,'cache.')=>['type'=> 'cache', 'name' => 'cache'],
    default => null, // not an outbound dependency; skip
};

Each non-null result is an edge from the application node to that dependency. Then you aggregate: group by (type, name), count calls, average duration, track error rate. Nodes get a weight (call volume), edges get a health (error rate, latency), and now you have a graph you can lay out.

The portability detail: reading JSON the same way everywhere

Those destinations live in a JSON data column, and Lookout runs on both SQLite (tests) and MySQL (production). The portable way to read a flat dotted key out of JSON in both engines:

json_extract(data, '$."server.address"')

The quoting matters: the key is the literal string server.address (with a dot in it), so it must be quoted as one path segment — '$."server.address"' — not '$.server.address', which both engines would read as nested server → address. Get that wrong and every lookup silently returns null.

Why "read, don't collect" keeps winning

This is the second feature this sprint (rate-limit tracking was another) that shipped value with zero new instrumentation — just a smarter question asked of spans already flowing in. That's the compounding dividend of getting the trace data model rich enough early: the more context each span carries, the more features you can synthesize later without ever shipping an SDK change. Collect well once; harvest forever.

Next: tying errors to the releases that caused them — the Incidents page.

deep-dive tracing performance observability