May 2, 2026·7 min read

Rate Limiting Strategies for Modern APIs

Rate limiting protects APIs from abuse, overload, and cost exploitation. But a naive per-IP limit is easy to evade. Effective rate limiting requires choosing the right algorithm, the right key, and the right response behavior.

Algorithms

Fixed window: count requests in a fixed time bucket (e.g., 100 per minute). Simple but allows burst at window boundaries.
Sliding window: count requests over a rolling time window. Smoother and harder to game at boundaries.
Token bucket: a bucket refills at a constant rate; each request consumes a token. Allows controlled bursting up to the bucket size.
Leaky bucket: requests enter a queue and are processed at a fixed rate. Smooths traffic but adds latency under load.

Key selection

IP-based limits are the baseline but the weakest option, since a large botnet trivially distributes requests across IPs. Key by API token, user ID, or a session fingerprint for authenticated endpoints. For unauthenticated endpoints, combine IP with TLS fingerprint.

Response behavior

Return 429 with a Retry-After header for well-behaved clients. For abusive clients, silently drop or throttle without signaling the limit, making it harder for attackers to probe and calibrate their request rate.