🖥️

Real incidents need a real screen.

Open senioreng.dev on your laptop for the full experience.

Live·00:00elapsed

Incident Workspace

Service Metrics

Environment

Production

Error Rate %

12%

avg · 1h

100%75%50%25%0%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
67%
67%
67%
67%
-60m-45m-15mNow

P99 Latency ms

8s

avg · 1h

45s34s23s11s0
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
80ms
45s
45s
45s
45s
-60m-45m-15mNow

Request Volume k req/min

12k

avg · 1h

14k11k7k4k0
12k req/min
11k req/min
12k req/min
13k req/min
12k req/min
11k req/min
12k req/min
13k req/min
12k req/min
12k req/min
11k req/min
13k req/min
12k req/min
11k req/min
12k req/min
13k req/min
12k req/min
11k req/min
12k req/min
12k req/min
13k req/min
12k req/min
12k req/min
11k req/min
-60m-45m-15mNow

Active App Servers

200

avg · 1h

200150100500
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
200 servers
-60m-45m-15mNow

Production Incident

API Degradation. Midnight Error Spike

On-Call Alert

dashboard-service error rate spiked to 67% at exactly 00:01 UTC. A deployment happened 11 minutes ago.

Pulse Analytics serves 800k daily active users. The service was healthy all day — then at 00:01 UTC exactly, error rate jumped to 67% and P99 latency went from 80ms to 45 seconds. A deployment was made at 23:50 UTC.

You are on-call. Investigate the available telemetry, identify the root cause, and restore service.