Rate Limit Authenticated Users in .NET: The Partitioned Rate Limiter Playbook

TL;DR

A single global rate limit punishes everyone for one noisy user. ASP.NET Core's PartitionedRateLimiter (built in since .NET 7) fixes that: use the authenticated user's ID — typically the sub claim on a JWT — as the partition key, and every user gets their own bucket. Pick one of the four algorithms (fixed window, sliding window, token bucket, concurrency), apply it at the API level for simple cases, or push it to a reverse proxy like YARP with a Redis backplane when you scale horizontally.

Why global limits fail

The default examples in most tutorials create one counter for the whole app. That means the first caller who sends a burst consumes the quota for every other user on the box. Microsoft even calls this out in the docs: “partitions divide the traffic into separate buckets that each get their own rate limit counters”. Without partitioning, you get four problems:

Noisy-neighbor collapse — one misbehaving client degrades everyone.
No fairness — free users and enterprise tenants share the same pool.
Weak security — per-IP limits are trivially bypassed via botnets, NAT, and proxy rotation.
No product control — you can't differentiate free vs paid plans at the traffic layer.

The partition key: user ID from JWT

The partition key is the field that answers “whose bucket is this?”. For authenticated APIs, the cleanest answer is the user's identity from the JWT. Two common ways to extract it inside the policy factory:

// Option 1 — NameIdentifier claim (usually the sub claim)
var key = context.User.FindFirstValue(ClaimTypes.NameIdentifier) ?? "anonymous";

// Option 2 — User.Identity.Name
var key = context.User.Identity?.Name ?? "anonymous";

Always provide an "anonymous" (or per-IP) fallback so unauthenticated requests still get throttled instead of sharing one bucket.

Four algorithms, one partition model

All four built-in algorithms work with PartitionedRateLimiter. Pick based on the shape of your traffic:

Algorithm	Bursts	Time-based	Best for
Fixed Window	Yes (boundary burst risk)	Yes	Simple quotas, background jobs
Sliding Window	Smoother	Yes	General-purpose API throttling
Token Bucket	Yes (burst until empty)	Yes	Interactive APIs, webhooks, mobile
Concurrency	N/A	No	Protecting scarce resources (DB pools, downstream calls)

The fixed, sliding, and token-bucket limiters cap requests-per-period. The concurrency limiter is different: it caps simultaneous in-flight requests. Use it when the bottleneck is concurrent access to a dependency, not request volume.

Fixed window: 100 requests per user per minute

builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

    options.AddPolicy("PerUserFixed", context =>
    {
        var key = context.User.FindFirstValue(ClaimTypes.NameIdentifier) ?? "anonymous";

        return RateLimitPartition.GetFixedWindowLimiter(key, _ => new FixedWindowRateLimiterOptions
        {
            PermitLimit = 100,
            Window = TimeSpan.FromSeconds(60),
            QueueLimit = 0
        });
    });
});

app.UseRateLimiter();

app.MapGet("/me", () => "hello").RequireRateLimiting("PerUserFixed");

Two things worth knowing: the default rejection status is 503 — flip it to 429 Too Many Requests since that's the correct semantic. And UseRateLimiter must come after UseRouting when you use endpoint-scoped policies.

Token bucket: allow bursts, reject sustained overload

options.AddPolicy("PerUserBucket", context =>
{
    var key = context.User.FindFirstValue(ClaimTypes.NameIdentifier) ?? "anonymous";

    return RateLimitPartition.GetTokenBucketLimiter(key, _ => new TokenBucketRateLimiterOptions
    {
        TokenLimit = 100,
        ReplenishmentPeriod = TimeSpan.FromSeconds(10),
        TokensPerPeriod = 20,
        AutoReplenishment = true,
        QueueLimit = 0
    });
});

Bucket holds up to 100 tokens, refills 20 every 10 seconds. A mobile client can burst through a full bucket, then settles into the sustained rate. Microsoft's samples note that Retry-After estimation works for token bucket, fixed, and sliding — but not for concurrency, since there's no time component.

Tiered plans: free vs premium

This is where rate limiting stops being a safeguard and starts being a product control. Keep the partition key as the user ID, but let the factory read the plan from claims or a cached lookup:

options.AddPolicy("PerPlan", context =>
{
    var key = context.User.FindFirstValue(ClaimTypes.NameIdentifier) ?? "anonymous";
    var plan = context.User.FindFirstValue("plan") ?? "free";

    var limit = plan switch
    {
        "enterprise" => 10_000,
        "pro"        => 1_000,
        _            => 100
    };

    return RateLimitPartition.GetFixedWindowLimiter(key, _ => new FixedWindowRateLimiterOptions
    {
        PermitLimit = limit,
        Window = TimeSpan.FromMinutes(1),
        QueueLimit = 0
    });
});

Enterprise customers negotiating custom quotas? Move plan definitions into a DB, cache them in Redis, and hydrate the factory from the cache. Keep lookups off the request hot path.

Scaling out: the process-local problem

Here's the quiet failure mode teams hit in production. The built-in middleware stores counters in-memory, per process. Set a “100 requests per minute” policy, run five instances behind a load balancer, and a client can realistically make 500 requests per minute by bouncing between nodes. Local limits still protect per-node CPU and thread pool, but they are not a global quota.

Two production fixes, usually combined:

Reverse proxy (YARP or APIM). Define named rate limiter policies in the host, bind them to proxy routes. YARP sits in front of your microservices so “dumb” throttling happens once at ingress instead of being duplicated across every service. Azure API Management adds rate-limit-by-key for subscription-plan enforcement at the edge.
Redis backplane. Libraries like aspnetcore-redis-rate-limiting keep the native .NET policy/partition model but push counters to Redis. Critically, they use atomic Lua scripts so GET, INCR, and EXPIRE happen in a single round-trip — no race conditions under load. Keep the ConnectionMultiplexer singleton-scoped.

IP gotchas when you sit behind a proxy

If your app is behind a reverse proxy and you partition by IP without reading X-Forwarded-For from a validated proxy chain, you'll rate-limit the proxy itself — so every user shares one bucket. Microsoft's docs also warn that creating partitions on raw IP makes you vulnerable to source-address-spoofing DoS (BCP 38 / RFC 2827). Stress-test with JMeter or Azure Load Testing before you ship.

The production pattern: layered limits

Senior teams don't pick one limiter — they stack three:

Edge (Cloudflare, Azure Front Door) — absorb broad volumetric abuse, anonymous floods.
Gateway (YARP / APIM with Redis) — enforce subscription-plan quotas and coarse caller rules.
Service — per-user, per-tenant, per-endpoint policies aware of downstream cost.

This separation ages well operationally: the edge team tunes abuse controls, the API platform team manages shared quotas, and service teams tune fine-grained policies around expensive endpoints without touching ingress.

What's next

The built-in middleware still ships in-memory only — no official Redis provider — so the OSS backplane pattern is likely here to stay for distributed enforcement. If you're just getting started, wire up a partitioned per-user policy today, switch the rejection status to 429, and plan the YARP/Redis move before your second production incident, not after.

Source: Advanced Rate Limiting Use Cases In .NET by Milan Jovanović, Microsoft Learn — Rate limiting middleware, Mastering Distributed Rate Limiting in ASP.NET Core.

Rate Limit Authenticated Users in .NET: The Partitioned Rate Limiter Playbook

TL;DR

Why global limits fail

The partition key: user ID from JWT

Four algorithms, one partition model

Fixed window: 100 requests per user per minute

Token bucket: allow bursts, reject sustained overload

Tiered plans: free vs premium

Scaling out: the process-local problem

IP gotchas when you sit behind a proxy

The production pattern: layered limits

What's next

Tiếp tục lướt

Visual Studio 2022 vừa nhận 230+ Azure MCP tools built-in — không cần extension nào nữa

Intigriti Bug Bytes #235: NPM 40M downloads bị chiếm, Cloudflare WAF thủng, và series JWT 20 phần

Modular Monolith: Cấu trúc đúng quan trọng hơn kích thước monolith

Terminal.Gui v2.0 RC1 lên NuGet: TUI cho .NET bước vào kỷ nguyên TrueColor

API Design Roadmap 2026: Từ HTTP Basics Đến Production-Ready