- Global rate limits are a trap.
- Use the user's JWT sub claim as a partition key and get fair, per-user throttling with the built-in ASP.NET Core middleware — plus YARP and Redis patterns for when you scale out.
TL;DR
A single global rate limit punishes everyone for one noisy user. ASP.NET Core's PartitionedRateLimiter (built in since .NET 7) fixes that: use the authenticated user's ID — typically the sub claim on a JWT — as the partition key, and every user gets their own bucket. Pick one of the four algorithms (fixed window, sliding window, token bucket, concurrency), apply it at the API level for simple cases, or push it to a reverse proxy like YARP with a Redis backplane when you scale horizontally.
Why global limits fail
The default examples in most tutorials create one counter for the whole app. That means the first caller who sends a burst consumes the quota for every other user on the box. Microsoft even calls this out in the docs: “partitions divide the traffic into separate buckets that each get their own rate limit counters”. Without partitioning, you get four problems:
- Noisy-neighbor collapse — one misbehaving client degrades everyone.
- No fairness — free users and enterprise tenants share the same pool.
- Weak security — per-IP limits are trivially bypassed via botnets, NAT, and proxy rotation.
- No product control — you can't differentiate free vs paid plans at the traffic layer.
The partition key: user ID from JWT
The partition key is the field that answers “whose bucket is this?”. For authenticated APIs, the cleanest answer is the user's identity from the JWT. Two common ways to extract it inside the policy factory:
// Option 1 — NameIdentifier claim (usually the sub claim)
var key = context.User.FindFirstValue(ClaimTypes.NameIdentifier) ?? "anonymous";
// Option 2 — User.Identity.Name
var key = context.User.Identity?.Name ?? "anonymous";Always provide an "anonymous" (or per-IP) fallback so unauthenticated requests still get throttled instead of sharing one bucket.
Four algorithms, one partition model
All four built-in algorithms work with PartitionedRateLimiter. Pick based on the shape of your traffic:
| Algorithm | Bursts | Time-based | Best for |
|---|---|---|---|
| Fixed Window | Yes (boundary burst risk) | Yes | Simple quotas, background jobs |
| Sliding Window | Smoother | Yes | General-purpose API throttling |
| Token Bucket | Yes (burst until empty) | Yes | Interactive APIs, webhooks, mobile |
| Concurrency | N/A | No | Protecting scarce resources (DB pools, downstream calls) |
The fixed, sliding, and token-bucket limiters cap requests-per-period. The concurrency limiter is different: it caps simultaneous in-flight requests. Use it when the bottleneck is concurrent access to a dependency, not request volume.
Fixed window: 100 requests per user per minute
builder.Services.AddRateLimiter(options =>
{
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
options.AddPolicy("PerUserFixed", context =>
{
var key = context.User.FindFirstValue(ClaimTypes.NameIdentifier) ?? "anonymous";
return RateLimitPartition.GetFixedWindowLimiter(key, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromSeconds(60),
QueueLimit = 0
});
});
});
app.UseRateLimiter();
app.MapGet("/me", () => "hello").RequireRateLimiting("PerUserFixed");Two things worth knowing: the default rejection status is 503 — flip it to 429 Too Many Requests since that's the correct semantic. And UseRateLimiter must come after UseRouting when you use endpoint-scoped policies.
Token bucket: allow bursts, reject sustained overload
options.AddPolicy("PerUserBucket", context =>
{
var key = context.User.FindFirstValue(ClaimTypes.NameIdentifier) ?? "anonymous";
return RateLimitPartition.GetTokenBucketLimiter(key, _ => new TokenBucketRateLimiterOptions
{
TokenLimit = 100,
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 20,
AutoReplenishment = true,
QueueLimit = 0
});
});Bucket holds up to 100 tokens, refills 20 every 10 seconds. A mobile client can burst through a full bucket, then settles into the sustained rate. Microsoft's samples note that Retry-After estimation works for token bucket, fixed, and sliding — but not for concurrency, since there's no time component.
Tiered plans: free vs premium
This is where rate limiting stops being a safeguard and starts being a product control. Keep the partition key as the user ID, but let the factory read the plan from claims or a cached lookup:
options.AddPolicy("PerPlan", context =>
{
var key = context.User.FindFirstValue(ClaimTypes.NameIdentifier) ?? "anonymous";
var plan = context.User.FindFirstValue("plan") ?? "free";
var limit = plan switch
{
"enterprise" => 10_000,
"pro" => 1_000,
_ => 100
};
return RateLimitPartition.GetFixedWindowLimiter(key, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = limit,
Window = TimeSpan.FromMinutes(1),
QueueLimit = 0
});
});Enterprise customers negotiating custom quotas? Move plan definitions into a DB, cache them in Redis, and hydrate the factory from the cache. Keep lookups off the request hot path.
Scaling out: the process-local problem
Here's the quiet failure mode teams hit in production. The built-in middleware stores counters in-memory, per process. Set a “100 requests per minute” policy, run five instances behind a load balancer, and a client can realistically make 500 requests per minute by bouncing between nodes. Local limits still protect per-node CPU and thread pool, but they are not a global quota.
Two production fixes, usually combined:
- Reverse proxy (YARP or APIM). Define named rate limiter policies in the host, bind them to proxy routes. YARP sits in front of your microservices so “dumb” throttling happens once at ingress instead of being duplicated across every service. Azure API Management adds
rate-limit-by-keyfor subscription-plan enforcement at the edge. - Redis backplane. Libraries like
aspnetcore-redis-rate-limitingkeep the native .NET policy/partition model but push counters to Redis. Critically, they use atomic Lua scripts soGET,INCR, andEXPIREhappen in a single round-trip — no race conditions under load. Keep theConnectionMultiplexersingleton-scoped.
IP gotchas when you sit behind a proxy
If your app is behind a reverse proxy and you partition by IP without reading X-Forwarded-For from a validated proxy chain, you'll rate-limit the proxy itself — so every user shares one bucket. Microsoft's docs also warn that creating partitions on raw IP makes you vulnerable to source-address-spoofing DoS (BCP 38 / RFC 2827). Stress-test with JMeter or Azure Load Testing before you ship.
The production pattern: layered limits
Senior teams don't pick one limiter — they stack three:
- Edge (Cloudflare, Azure Front Door) — absorb broad volumetric abuse, anonymous floods.
- Gateway (YARP / APIM with Redis) — enforce subscription-plan quotas and coarse caller rules.
- Service — per-user, per-tenant, per-endpoint policies aware of downstream cost.
This separation ages well operationally: the edge team tunes abuse controls, the API platform team manages shared quotas, and service teams tune fine-grained policies around expensive endpoints without touching ingress.
What's next
The built-in middleware still ships in-memory only — no official Redis provider — so the OSS backplane pattern is likely here to stay for distributed enforcement. If you're just getting started, wire up a partitioned per-user policy today, switch the rejection status to 429, and plan the YARP/Redis move before your second production incident, not after.
Source: Advanced Rate Limiting Use Cases In .NET by Milan Jovanović, Microsoft Learn — Rate limiting middleware, Mastering Distributed Rate Limiting in ASP.NET Core.
