·7 min read
Rate Limiting Strategies for Modern APIs
Rate limiting protects APIs from abuse, overload, and cost exploitation. But a naive per-IP limit is easy to evade. Effective rate limiting requires choosing the right algorithm, the right key, and the right response behavior.
Algorithms
- Fixed window: count requests in a fixed time bucket (e.g., 100 per minute). Simple but allows burst at window boundaries.
- Sliding window: count requests over a rolling time window. Smoother and harder to game at boundaries.
- Token bucket: a bucket refills at a constant rate; each request consumes a token. Allows controlled bursting up to the bucket size.
- Leaky bucket: requests enter a queue and are processed at a fixed rate. Smooths traffic but adds latency under load.
Key selection
IP-based limits are the baseline but the weakest option, since a large botnet trivially distributes requests across IPs. Key by API token, user ID, or a session fingerprint for authenticated endpoints. For unauthenticated endpoints, combine IP with TLS fingerprint.
Response behavior
Return 429 with a Retry-After header for well-behaved clients. For abusive clients, silently drop or throttle without signaling the limit, making it harder for attackers to probe and calibrate their request rate.