Incident Workspace
Service Metrics
Environment
Production
Error Rate %
10%
avg · 1h
P95 Latency ms
2.1s
avg · 1h
Request Volume k/min
12k
avg · 1h
Success Rate %
90%
avg · 1h
order-service
Incident started 9 minutes ago
Open senioreng.dev on your laptop for the full experience.
Incident Workspace
Environment
Production
Error Rate %
10%
avg · 1h
P95 Latency ms
2.1s
avg · 1h
Request Volume k/min
12k
avg · 1h
Success Rate %
90%
avg · 1h
Incident started 9 minutes ago
Production Incident
Incident Commander Update
Checkout requests are failing at an increasing rate and customer orders are no longer being processed reliably.
The order-service is experiencing a major production incident. Error rates have climbed above 30%, request latency has increased from milliseconds to several seconds, and order creation requests are frequently timing out.
Customers attempting to place orders are encountering failures during checkout. Autoscaling has already been triggered, but service performance continues to degrade despite healthy infrastructure metrics.
You are the primary on-call engineer. Examine the available telemetry, determine what changed in the latest deployment, identify the underlying bottleneck, and restore normal order processing before revenue impact grows further.