- One entry point between clients and a fleet of microservices.
- Why Netflix runs 80+ Zuul clusters at 1M+ requests/second, how the pattern differs from a load balancer, and when skipping the gateway is actually the right call.
TL;DR
An API Gateway is a single entry point that sits between clients and backend services. It centralizes authentication, rate limiting, SSL termination, routing, and response aggregation so clients don't have to deal with a dozen microservice URLs. Netflix runs 80+ Zuul 2 clusters handling over 1M requests/second. Kong benchmarks 50,000 TPS per node. Used everywhere in cloud-native — but not free: extra latency hop, single point of failure if misdeployed, and real operational cost. Skip it for a single-service app.
The problem: direct client-to-service communication
Imagine a mobile app that needs auth, profile, payments, notifications, and search. If the client calls each service directly, things fall apart fast:
- Client manages many base URLs and retries per service
- Authentication and token refresh logic gets duplicated in every client
- Versioning becomes a cross-team coordination nightmare
- Internal services are exposed to the public internet
- Monitoring is fragmented across N backends
Every new service ships pain to every client team. This is the pattern the API Gateway exists to kill.
What an API Gateway actually does
At its core, the gateway is a Layer 7 reverse proxy that speaks HTTP (not just IPs and ports). One public endpoint fronts many internal services, and it handles cross-cutting concerns in one place:
- Authentication & JWT validation — verify once, pass identity downstream
- SSL/TLS termination — certificates live at the edge, not on every service
- Routing — path/header-based routing to the right backend
- Rate limiting & throttling — protect services from abusive clients and DDoS
- Request aggregation — fan out to multiple services, return one response
- Response caching — shield backends from repeat reads
- Protocol translation — HTTP to gRPC, WebSocket, or internal RPC
- API versioning —
/v1vs/v2routed to different backend stacks - Observability — central logs, traces, and metrics
The gateway becomes the traffic control layer — the one place where platform-wide policy is enforced.
Gateway vs Load Balancer — don't confuse them
This trips up a lot of engineers. They solve different problems:
| Aspect | Load Balancer | API Gateway |
|---|---|---|
| OSI Layer | L4 (IP/port) or basic L7 | L7 (application, HTTP-aware) |
| Primary job | Distribute traffic across identical instances | Apply API policy & routing |
| Knows about | Hosts and ports | Endpoints, auth, quotas, versions |
| Typical features | Health checks, failover, round-robin | JWT auth, rate limits, transforms, aggregation |
In production you usually run both:
Client → L4 Load Balancer → API Gateway cluster → MicroservicesThe load balancer gives the gateway high availability. The gateway does the API work. Kong, for example, runs on top of NGINX — NGINX is the engine, Kong is the policy layer.
Technical facts & numbers
- Netflix Zuul 2: 80+ clusters, ~100 backend service clusters, over 1 million requests per second.
- Kong 3.6: >50,000 TPS per node on AWS
c6ginstances. - Latency tax: a well-provisioned gateway adds ~1–10ms per request.
- Netflix BFF split: three API surfaces — Signup API (non-members), Discovery API (search/recommendations), Play API (streaming/licensing) — each a Backend-for-Frontend tuned for its clients.
- AWS API Gateway: priced per-million-requests; high-TPS public APIs can run into thousands of USD/month.
Real-world examples
A food-delivery homepage needs user profile, nearby restaurants, active coupons, cart status, and recommendations. Five backend services. Without a gateway, the client fires five parallel requests, handles five error states, and wastes battery on cellular. With a gateway aggregating them, the client fires one — cleaner UX, fewer round trips, same backends.
Companies using the pattern at scale:
- Netflix — Zuul fronts device APIs with a BFF per client form factor
- Amazon — internal API layers across the service mesh; AWS API Gateway as a managed product
- Uber — edge gateway handling auth and regional routing across microservices
- Stripe — strict versioning and idempotency enforced at the API boundary
Limitations & when NOT to use one
The gateway solves real problems but adds its own:
- Single point of failure if deployed as one node — must run clustered behind an L4 LB
- Bottleneck risk — under-provisioned gateways saturate before services do
- Blast radius — a bad config pushes bad behavior to all traffic instantly
- Scope creep — business logic leaking into the gateway recreates a distributed monolith
- Ops burden — self-hosted (Kong/Envoy) needs a team; managed (AWS/Apigee) gets expensive
Skip it when: you have one service, low traffic, or no team capacity to operate it. Add it when ≥3–4 backend services, multiple client types, or public API surface creates duplicate auth/routing logic across clients.
Popular tools
Open source: Kong, NGINX (with modules), Envoy, Traefik, Spring Cloud Gateway, Zuul.
Managed: AWS API Gateway, Google Apigee, Azure API Management.
The golden rule
Architecture should match scale. If you're a one-service startup, a gateway is ceremony. If you're running eight backends behind three client types, not having one means you're paying the complexity tax in every client codebase — forever.
Sources: microservices.io, Azure Architecture Center, Netflix/zuul, Kong, IBM.