🖥️

Real incidents need a real screen.

Open senioreng.dev on your laptop for the full experience.

Live·00:00elapsed

Incident Workspace

Service Metrics

Environment

Production

Error Rate %

4%

avg · 1h

18%14%9%5%0%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
2%
3%
5%
8%
11%
14%
16%
18%
-60m-45m-15mNow

P95 Latency ms

843ms

avg · 1h

2.4s1.8s1.2s0.6s0
110ms
110ms
112ms
110ms
112ms
110ms
111ms
110ms
115ms
120ms
140ms
180ms
280ms
450ms
700ms
1.0s
1.4s
1.7s
1.9s
2.1s
2.2s
2.3s
2.4s
2.4s
-60m-45m-15mNow

Throughput req/s

307

avg · 1h

4203152101050
200 req/s
202 req/s
198 req/s
205 req/s
200 req/s
204 req/s
208 req/s
215 req/s
228 req/s
245 req/s
265 req/s
290 req/s
318 req/s
345 req/s
368 req/s
385 req/s
396 req/s
404 req/s
410 req/s
413 req/s
416 req/s
418 req/s
419 req/s
420 req/s
-60m-45m-15mNow

Success Rate %

94%

avg · 1h

100%75%50%25%0%
99%
99%
99%
99%
99%
99%
99%
99%
99%
99%
98%
97%
96%
95%
94%
93%
92%
91%
90%
89%
87%
85%
83%
81%
-60m-45m-15mNow

internal-api

Incident started 12 minutes ago

CRITICAL
Throughput
CPU Saturation
Last 15 minutes
14:5515:0015:0515:10

Production Incident

Internal API Reliability Incident

Incident Commander Update

Multiple teams are blocked on an internal-api.

Multiple downstream services are reporting failures and elevated latency when calling an internal-api.

Error rates have risen rapidly, request latency continues to increase, and customer-facing systems are beginning to experience degraded reliability.

You are the primary on-call engineer. Investigate the available telemetry, identify the root cause, and restore service stability.