🖥️

Real incidents need a real screen.

Open senioreng.dev on your laptop for the full experience.

Live·00:00elapsed

Incident Workspace

Service Metrics

Environment

Production

Error Rate %

7%

avg · 1h

18%14%9%5%0%
1%
1%
1%
1%
1%
1%
1%
1%
18%
18%
18%
18%
09:0009:2009:4009:49

P95 Latency ms

1.1s

avg · 1h

3.2s2.4s1.6s0.8s0
45ms
45ms
45ms
44ms
46ms
45ms
45ms
44ms
3.2s
3.2s
3.2s
3.2s
09:0009:2009:4009:49

Request Volume k/min

11k

avg · 1h

15k11k7k4k0
10k
10k
11k
10k
11k
10k
11k
10k
10k
11k
12k
11k
09:0009:2009:4009:49

Success Rate %

94%

avg · 1h

100%75%50%25%0%
99%
99%
99%
99%
100%
99%
99%
100%
82%
82%
82%
82%
09:0009:2009:4009:49

employee-service · Meridian HR

600 enterprise clients experiencing degraded employee search — HR workflows blocked

CRITICAL

P95 Latency

3,200ms

Baseline P95

45ms

Degradation

71×

Affected endpoints

POST /employees/search
3,200msTIMEOUT
GET /headcount/analytics/departments
11msHEALTHY
GET /employees/:id
3msHEALTHY
GET /employees/count
8msHEALTHY

Client impact

Enterprise clients affected

all clients — employee search is core workflow

600

HR workflows blocked

employee lookups, onboarding, payroll

~8,400/hr

SLA breach

P95 must be under 500ms per contract

Active

Support tickets opened

in last 15 minutes

47

Production Incident

Employee Search Latency Crisis

Incident Commander Update

Employee search requests are timing out across enterprise customers, impacting critical HR workflows.

The employee-service is experiencing severe latency degradation. P95 latency has increased from 45ms to over 3 seconds, and timeout errors are affecting employee search functionality across the platform.

More than 600 enterprise customers are reporting failures when searching employee records. While most endpoints remain healthy, search-related workflows are breaching contractual SLA targets.

You are the primary on-call engineer. Investigate the recent migration, analyze application and database telemetry, identify the true cause of the latency spike, and restore service performance before customer impact escalates further.