Incident Workspace
Service Metrics
Environment
Production
Error Rate %
11%
avg · 1h
P95 Latency ms
3.7s
avg · 1h
Request Volume k/min
13k
avg · 1h
Success Rate %
89%
avg · 1h
auth-service
Incident started 14 minutes ago
Open senioreng.dev on your laptop for the full experience.
Incident Workspace
Environment
Production
Error Rate %
11%
avg · 1h
P95 Latency ms
3.7s
avg · 1h
Request Volume k/min
13k
avg · 1h
Success Rate %
89%
avg · 1h
Incident started 14 minutes ago
🛠️ Incident Mitigations
Choose operational mitigations and debugging actions. Every decision consumes time and affects the incident.
Investigate first
Check at least 3 data points on the left panel before taking any mitigations. Acting without data makes incidents worse.
Production Incident
Incident Commander Update
Login and authentication failures are increasing rapidly across customer-facing applications.
The auth-service is experiencing a critical outage. Error rates have climbed above 34%, request latency has increased from milliseconds to more than 12 seconds, and authentication requests are timing out throughout the platform.
Customer login attempts are failing, worker threads appear heavily blocked, and autoscaling has not improved service health despite additional capacity being provisioned.
You are the primary on-call engineer. Investigate the latest deployment, analyze traces and runtime behavior, identify the true root cause of the failure, and restore authentication services before the outage spreads further.