๐Ÿ–ฅ๏ธ

Real incidents need a real screen.

Open senioreng.dev on your laptop for the full experience.

Liveยท00:00elapsed

Incident Workspace

Service Metrics

Environment

Production

P99 Latency ms

187ms

avg ยท 1h

450ms340ms225ms113ms0
55ms
54ms
56ms
55ms
54ms
55ms
55ms
54ms
450ms
450ms
450ms
450ms
23:0023:2023:4000:27

Error Rate %

1%

avg ยท 1h

5%4%3%2%0%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
23:0023:2023:4000:27

Request Volume k/min

8k

avg ยท 1h

10k8k5k3k0
8k
8k
9k
8k
8k
9k
8k
8k
9k
8k
9k
8k
23:0023:2023:4000:27

Disk I/O %

41%

avg ยท 1h

100%75%50%25%0%
12%
12%
11%
12%
12%
11%
12%
12%
98%
98%
98%
98%
23:0023:2023:4000:27

All Services โ€” P99 Latency

All 8 services degraded simultaneously โ€” error rate 0.0% on all

api-gateway

Error rate: 0.0% ยท CPU: normal ยท Memory: normal

420ms

was 48ms

dashboard-service

Error rate: 0.0% ยท CPU: normal ยท Memory: normal

510ms

was 62ms

report-service

Error rate: 0.0% ยท CPU: normal ยท Memory: normal

680ms

was 71ms

auth-service

Error rate: 0.0% ยท CPU: normal ยท Memory: normal

390ms

was 44ms

analytics-service

Error rate: 0.0% ยท CPU: normal ยท Memory: normal

740ms

was 85ms

export-service

Error rate: 0.0% ยท CPU: normal ยท Memory: normal

890ms

was 94ms

notification-service

Error rate: 0.0% ยท CPU: normal ยท Memory: normal

360ms

was 41ms

billing-service

Error rate: 0.0% ยท CPU: normal ยท Memory: normal

480ms

was 55ms

Production Incident

The Morning Slowdown

On-Call Alert

Every service on the platform is degraded. P99 latency is 8ร— normal across all 8 services. No errors. No deployments. Started 40 minutes ago.

Vantage Analytics serves 3,800 B2B companies. Every single service degraded simultaneously 40 minutes ago โ€” not one service, all of them. No errors. No recent deployments. CPU, memory, and network are all normal. Something is causing platform-wide slowness. You're on-call.