🖥️

Real incidents need a real screen.

Open senioreng.dev on your laptop for the full experience.

Live·00:00elapsed

Incident Workspace

Service Metrics

Environment

Production

Error Rate %

6%

avg · 1h

28%21%14%7%0%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
1%
2%
4%
7%
12%
18%
23%
27%
28%
-60m-45m-15mNow

P95 Latency ms

2.0s

avg · 1h

4.2s3.2s2.1s1.1s0
180ms
182ms
180ms
181ms
180ms
182ms
185ms
200ms
280ms
450ms
800ms
1.4s
2.1s
2.7s
3.2s
3.5s
3.7s
3.9s
4.0s
4.0s
4.1s
4.2s
4.2s
4.2s
-60m-45m-15mNow

Inbound Request Volume req/min

1.9k

avg · 1h

4.8k3.6k2.4k1.2k0
320 req/min
318 req/min
322 req/min
320 req/min
318 req/min
320 req/min
322 req/min
320 req/min
325 req/min
340 req/min
380 req/min
480 req/min
700 req/min
1.1k req/min
1.8k req/min
2.7k req/min
3.5k req/min
4.0k req/min
4.4k req/min
4.6k req/min
4.7k req/min
4.8k req/min
4.8k req/min
4.8k req/min
-60m-45m-15mNow

Success Rate %

90%

avg · 1h

100%75%50%25%0%
99%
99%
99%
99%
99%
99%
99%
99%
99%
99%
98%
97%
96%
94%
92%
89%
86%
83%
79%
76%
74%
72%
72%
71%
-60m-45m-15mNow

payment-service

Incident started 18 minutes ago

CRITICAL
Inbound Volume
Error Rate
Last 20 minutes
11:1511:2011:2511:30

Production Incident

Payment Service Outage

Incident Commander Update

Checkout failures are increasing rapidly across multiple customer-facing services.

The payment-service is experiencing a severe outage. Error rates have climbed to 28%, latency has increased from milliseconds to several seconds, and order processing is failing across the platform.

Multiple upstream services are reporting degraded health, request queues are backing up, and customers are unable to complete purchases successfully.

You are the primary on-call engineer. Investigate the available telemetry, identify what triggered the incident, determine the true root cause, and restore service stability before the outage spreads further.