·7 min read
Building Resilient APIs Under Heavy Load
An API that performs well at 100 requests per second may completely fall apart at 10,000. The failure modes are usually predictable. Building resilience means understanding them in advance.
Common failure modes under load
- Database connection pool exhaustion: connections are finite. Under load, requests queue waiting for a free connection, latency climbs, and the queue grows unbounded.
- Cascading timeouts: a slow downstream service causes upstream requests to time out, which triggers retries, which multiplies load.
- Memory pressure: unbounded in-memory queues or caches fill available RAM and cause OOM crashes.
- Hot endpoints: a single expensive endpoint (full-text search, complex aggregation) absorbs all capacity under load.
Patterns that help
- Concurrency limits: cap the number of simultaneous requests each handler will process. Excess requests get a 503, not a hang.
- Circuit breakers: stop calling a failing downstream service and return a fallback response immediately.
- Load shedding: under extreme load, deliberately reject low-priority requests rather than accepting all of them and serving none well.
- Edge rate limiting: reject abusive traffic before it reaches your application layer at all.