AI is making it easier than ever to write code.
The real skill now is debugging, code reviews and preventing disasters in production.
SeniorEng puts you inside real production incidents and pull requests.
Investigate systems, click through logs, read the diffs and find the bug.
Payment Service: Error rate 94% ยท 6 min ago
Your Move
Hint available ยท Score tracked
PR #4471: Add pagination to /orders endpoint
Medium ยท 15 minProblems
Train real-world engineering judgment.
Debugging Problems
How it works
You're paged in
An alert fires. Read the incident brief. What's down, what's burning.
Investigate on the left sidebar
Logs, metrics, traces, Slack threads, all on the left sidebar. The cause is buried in there. So are the red herrings.
Mitigate on the right panel
Take action on the right panel. Stuck? Use a hint. Your score tracks how fast you solved it, and whether you reached for the right lever.
Easy
Internal API Reliability Incident
Investigate a sudden degradation of a critical API.
Email Service Failure Incident
Investigate degradation of email service.
Payment Failures. 33% of Requests
Exactly 33% of payment requests are failing. Everything else looks fine.
Medium
Payment Service Outage
Investigate a sudden degradation of Payment Service.
Netflix Down on Christmas Eve
100% stream start failure on peak viewing night. ELB has zero healthy instances. Inspired by a real incident.
Job Queue Degradation
Investigate a sudden degradation of a critical Job Queue.
YouTube, Gmail and GMeet Down
YouTube, Gmail, Meet, and Drive all went down at once. Inspired by a real incident.
Large Latency Incident
Investigate a sudden increase in latencies.
Order Service Failure
Investigate a sudden degradation of Order Service.
API Degradation. Midnight Error Spike
Error rate jumps to 67% at exactly midnight.
503s Across All Endpoints
503s across every endpoint, including login. Traffic is normal.
Catalog Under Pressure
DB CPU jumped from 9% to 94% with no traffic spike. Every product page is timing out.
The Morning Slowdown
Every service on the platform degraded simultaneously. No errors. No deployments.
Hard
$440M in 45 Minutes
Market open 17 minutes ago. P&L is down $218M at $11M/minute. No errors. All 8 trading servers appear healthy. Inspired by a real incident.
Auth Service Failure
Investigate a production outage in the Auth Service.
All Products Down. Zero External Traffic
All three products dropped to zero external traffic simultaneously. Internal health is green. Inspired by a real incident.
Orders Placed. Never Arriving
Orders are being placed but never appearing. Investigate.
Error Rate Climbing. Service Degraded
Error rate spiked after a recent deploy. Investigation is ongoing.
Object Storage Down. 100% 503s
All object storage operations in us-east-1 returning 503. Downstream services cascading. Inspired by a real incident.
Error Rate Rising to 100%
Error rate was 28%. A fix was attempted. Now it's 100%.
OOM Kills Every 18 Hours
Service crashes with OOM every 18 hours. Memory climbs at 180MB/hour. CPU is normal.
Database Inconsistent Writes
Database writes returning inconsistent results across nodes. Inspired by a real incident.
Queue Depth Incident
One merchant's payments are 45 minutes delayed. 14 other merchants: instant. Error rate: 0%.
19 Datacenters Down. Simultaneously.
A single network config change took 19 Cloudflare PoPs offline at once. 80% of requests failing globally. Inspired by a real incident.
Checkout, Cart, Billing. 100% Errors
Three services failed at the same time. The service they depend on recovered 4 minutes ago.
Every Windows Host. Simultaneously.
8.5 million Windows machines BSOD'd simultaneously. No code deploy in 72 hours. macOS and Linux hosts are completely fine. Inspired by a real incident.
Cache Errors. 18 Services Down.
18 unrelated services are all failing with cache errors. Memory is at 42%. Replication is fine.
Code Review Problems
How it works
Read the PR
A pull request just landed for review. Go through the changes, understand what's changing and why.
Spot the issues
Find the bugs, security holes, and performance traps hiding in the code. The left sidebar has context and Slack threads. Some issues are obvious. Most aren't.
Make your verdict
Approve, request changes, or block on the right panel. Hints available if you need them. Your score tracks what you caught and what you missed.
Easy
The Polite Failure
Review a critical report generation PR.
Webhook Integration for Payment Events
Review a critical webhook integration PR.
The Workspace Connect
Review a Slack OAuth integration PR for a fintech SaaS.
Forgot Password Flow
Review a self-serve password reset PR that will clear 200+ weekly support tickets.
Medium
SSL Certificate Verification Patch
Review a security patch to the TLS certificate verification function. Ships in tonight's release. Inspired by a real incident.
Route Dashboard Reads to Read Replica
Review a critical perf enhancement PR.
The Onboarding Gap
Review a critical production onboarding flow PR.
Adding Pagination
Review a critical pagination PR.
The Cascade Confusion
Review a critical perf enhancement PR.
The Config Cache
Review a Redis config caching PR for a multi-region SaaS.
The Document Share
Review a document fetch endpoint PR for a legal SaaS.
The Case History Search
Review a date-range search PR against a 50M-row table.
Multi-Currency Checkout
Review a currency conversion PR ahead of a major EU launch.
Referral Codes at Signup
Review a referral code system PR before a growth campaign launch.
Real-Time Shipping Estimates
Review a checkout PR integrating a real-time shipping API.
Automatic Push Notification Retry
Review a retry logic PR for 1.2M daily push notifications.
Hard
Added Lock to Prevent Duplication
Review a distributed Redis lock PR.
The Retry Fix
Review an idempotent payment retry PR for a payments startup.
WAF Rule: SQL Injection Detection
Review a Lua WAF rule that runs on every HTTP request at the edge. Inspired by a real incident.
Link Preview Cards
Review a link preview feature PR for a B2B messaging app on AWS.
Remove legacy_plan_id Migration
Review a schema cleanup PR that removes a deprecated column.