Rate Limiting Auth: Stopping Brute Force and Stuffing

Why Rate Limiting Is the First Line of Authentication Defense

Every authentication endpoint on the public internet is under attack. Within seconds of exposing a new login page, automated tooling begins probing it with credentials harvested from breaches, stealer logs, and underground markets. Rate limiting is the first and cheapest control you can deploy to slow those attacks, and when implemented correctly it is the difference between a handful of nuisance login failures and a full credential stuffing campaign that drains accounts overnight.

But rate limiting is not a single technique. It is a family of algorithms, key strategies, and response behaviors that must be tuned together. Poor rate limiting creates false positives that lock out legitimate users behind corporate NAT gateways. Overly permissive rate limiting lets attackers grind through millions of username/password pairs unchecked. This post walks through the algorithms, keying strategies, and integration patterns that separate effective brute force protection from security theater.

Algorithm Choices: Token Bucket, Leaky Bucket, and Sliding Window

The three canonical rate limiting algorithms each behave differently under burst traffic, and picking the wrong one for authentication can let attacks through.

Token Bucket

In the token bucket model, each key has a bucket that holds up to N tokens. Tokens refill at a steady rate, and every request consumes one. When the bucket is empty, requests are rejected. Token bucket is forgiving of short bursts, which matches legitimate human behavior: a user who mistypes their password twice and then succeeds on the third attempt looks healthy.

Pros: Simple to implement in Redis with INCR and EXPIRE, tolerates bursts, memory efficient.
Cons: Attackers can learn the burst size and pace requests to stay just under it.

Leaky Bucket

Leaky bucket enforces a strict outflow rate. Requests queue into the bucket and drain at a fixed interval. If the queue overflows, requests are dropped. It smooths traffic aggressively, which is useful for downstream protection but can feel punitive on login endpoints where legitimate users expect instant responses.

Sliding Window Log and Sliding Window Counter

Sliding window algorithms track request counts over a moving time window. The sliding window log stores every request timestamp (accurate but expensive), while the sliding window counter approximates by weighting the previous fixed window. For authentication, the sliding window counter is usually the sweet spot: accurate enough to catch slow drip attacks, cheap enough to run at millions of requests per second.

For most login endpoints, we recommend a hybrid: token bucket for short term burst control, layered with a sliding window counter for longer horizon limits (e.g., 5 attempts per minute, 20 per hour, 100 per day).

Keying Strategy: Per IP, Per Account, Per Device

The algorithm is only half the story. What you key on determines what kind of attack you can stop.

Per IP Rate Limiting

The default. Key the counter on the client IP address. This stops naive single source brute force instantly. It is also the strategy most likely to cause false positives, because large numbers of legitimate users share IPs through corporate NAT, carrier grade NAT (CGNAT), VPN exit nodes, and mobile networks. A single CGNAT gateway can front hundreds of thousands of subscribers.

Mitigations:

Maintain an allowlist or softer threshold for known shared infrastructure (mobile carrier ranges, enterprise egress).
Combine IP with user agent and ASN for a coarser fingerprint.
Raise per IP limits high and rely on per account keys for the real enforcement.

Per Account Rate Limiting

Key the counter on the username or email being attempted, not the source IP. This is the only effective defense against credential stuffing from distributed residential proxy networks, where each request may come from a different IP. Per account limits should be strict: 5-10 failed attempts per hour before escalation.

Watch out for username enumeration: rate limiting based on account existence leaks which accounts are real. Always apply the same limit to requests for nonexistent accounts, and return identical error responses.

Per Device Fingerprinting

Device fingerprinting (canvas, WebGL, audio context, installed fonts, hardware concurrency) produces a stable identifier that survives IP rotation. Combined with passive TLS fingerprints like JA3 and JA4, it lets you rate limit the actual attacker tool even when it cycles through thousands of proxies. This is the most powerful key for sophisticated credential stuffing, but it requires client side instrumentation and a fingerprinting service.

Handling NAT, CGNAT, and Shared Infrastructure

Mobile networks and corporate egress gateways are the hardest cases. A single public IP may represent an entire office or an entire city block. Hard blocking that IP takes down thousands of users.

Recommended patterns:

Tiered thresholds: Track the ratio of unique usernames to requests from the same IP. A healthy shared IP sees many users each attempting their own credentials. A credential stuffing source sees many usernames from a narrow set of automation signatures.
ASN awareness: Apply different limits to residential ISPs, cloud providers, and known hosting networks. Requests from AWS, Hetzner, or OVH hitting a consumer login page almost never represent legitimate users.
Challenge instead of block: When a shared IP crosses a threshold, issue a CAPTCHA or step up challenge rather than a hard 429. This preserves legitimate access while costing automation real money.

Exponential Backoff vs Hard Lockout

Once an attacker crosses a threshold, what do you do? Two schools of thought.

Hard Lockout

Lock the account or IP for a fixed duration (15 minutes, 1 hour, 24 hours). Simple, deterministic, and effective against low volume attacks. The downside is denial of service: an attacker who knows a target username can lock the victim out of their own account indefinitely by hammering the endpoint.

Exponential Backoff

Double the delay with each failed attempt (1s, 2s, 4s, 8s, 16s, 32s...). The legitimate user barely notices on their first retry, but an attacker grinding credentials hits a wall within seconds. Exponential backoff resists the denial of service problem because legitimate users can still eventually log in.

For account level protection, we recommend exponential backoff on failures combined with a sliding window hard cap (e.g., no more than 20 attempts per account per hour regardless of source).

Integration with Bot Scoring and Device Signals

Rate limiting in isolation cannot stop modern credential stuffing. Attackers use residential proxy networks that rotate IPs on every request, solve CAPTCHAs via human farms or ML models, and mimic browser fingerprints. Effective defense requires layered signals:

Bot scoring: Behavioral analytics (mouse movement, typing cadence, form interaction timing) score each request on a likelihood scale. Rate limits tighten as bot scores rise.
TLS fingerprints: JA3/JA4 fingerprints identify automation frameworks (Puppeteer, Playwright, headless Chrome variants) even when user agent strings are spoofed.
Credential intelligence: Check submitted credentials against known compromised password databases before accepting them. If a user is attempting a password that appears in a recent stealer log for that same email, treat every attempt as high risk. This is where services like Revealer.US plug into the authentication pipeline. See our docs for integration patterns.
CAPTCHA as a scalpel: Do not show CAPTCHA to everyone. Show it only when rate limit thresholds or bot scores indicate elevated risk. This preserves UX for 99 percent of users while forcing automation to pay a cost.

Why Rate Limiting Alone Is Not Enough

The uncomfortable truth is that a well funded credential stuffing operation will bypass any single control. Attackers map the MITRE ATT&CK technique T1110.004 (Credential Stuffing) through the following chain:

Purchase combo lists from infostealer log aggregators.
Rent residential proxies across 50+ countries at less than a cent per IP.
Distribute the attack so no single IP exceeds 2-3 requests per minute.
Rotate device fingerprints and solve CAPTCHAs on the fly.
Harvest successful logins silently over days or weeks.

Against that adversary, per IP rate limiting is irrelevant. Per account limits help, but require enumeration protection. Device fingerprinting helps until the attacker swaps tools. The only durable defense is layered detection that combines all of these signals plus continuous monitoring of credential exposure outside your perimeter.

Conclusion

Rate limiting is a necessary but insufficient control. Implement it correctly and you eliminate the 95 percent of attacks that come from unsophisticated bots and single source scripts. Combine it with per account keying, exponential backoff, device fingerprinting, bot scoring, and credential intelligence, and you raise the cost of credential stuffing enough that most attackers move on to softer targets.

The goal is not to make your login endpoint uncrackable. The goal is to make it more expensive to attack than the attacker is willing to spend. Layered rate limiting, paired with real time credential exposure monitoring, is how modern authentication security gets there.

Want to add compromised credential intelligence to your authentication pipeline? Start a free trial of Revealer.US and catch stuffing attacks before they land.