API Gateway: Why Large Systems Don't Expose Every Service Directly

TL;DR

An API Gateway is a single entry point that sits between clients and backend services. It centralizes authentication, rate limiting, SSL termination, routing, and response aggregation so clients don't have to deal with a dozen microservice URLs. Netflix runs 80+ Zuul 2 clusters handling over 1M requests/second. Kong benchmarks 50,000 TPS per node. Used everywhere in cloud-native — but not free: extra latency hop, single point of failure if misdeployed, and real operational cost. Skip it for a single-service app.

The problem: direct client-to-service communication

Imagine a mobile app that needs auth, profile, payments, notifications, and search. If the client calls each service directly, things fall apart fast:

Client manages many base URLs and retries per service
Authentication and token refresh logic gets duplicated in every client
Versioning becomes a cross-team coordination nightmare
Internal services are exposed to the public internet
Monitoring is fragmented across N backends

Every new service ships pain to every client team. This is the pattern the API Gateway exists to kill.

What an API Gateway actually does

At its core, the gateway is a Layer 7 reverse proxy that speaks HTTP (not just IPs and ports). One public endpoint fronts many internal services, and it handles cross-cutting concerns in one place:

Authentication & JWT validation — verify once, pass identity downstream
SSL/TLS termination — certificates live at the edge, not on every service
Routing — path/header-based routing to the right backend
Rate limiting & throttling — protect services from abusive clients and DDoS
Request aggregation — fan out to multiple services, return one response
Response caching — shield backends from repeat reads
Protocol translation — HTTP to gRPC, WebSocket, or internal RPC
API versioning — /v1 vs /v2 routed to different backend stacks
Observability — central logs, traces, and metrics

The gateway becomes the traffic control layer — the one place where platform-wide policy is enforced.

Gateway vs Load Balancer — don't confuse them

This trips up a lot of engineers. They solve different problems:

Aspect	Load Balancer	API Gateway
OSI Layer	L4 (IP/port) or basic L7	L7 (application, HTTP-aware)
Primary job	Distribute traffic across identical instances	Apply API policy & routing
Knows about	Hosts and ports	Endpoints, auth, quotas, versions
Typical features	Health checks, failover, round-robin	JWT auth, rate limits, transforms, aggregation

In production you usually run both:

Client → L4 Load Balancer → API Gateway cluster → Microservices

The load balancer gives the gateway high availability. The gateway does the API work. Kong, for example, runs on top of NGINX — NGINX is the engine, Kong is the policy layer.

Technical facts & numbers

Netflix Zuul 2: 80+ clusters, ~100 backend service clusters, over 1 million requests per second.
Kong 3.6: >50,000 TPS per node on AWS c6g instances.
Latency tax: a well-provisioned gateway adds ~1–10ms per request.
Netflix BFF split: three API surfaces — Signup API (non-members), Discovery API (search/recommendations), Play API (streaming/licensing) — each a Backend-for-Frontend tuned for its clients.
AWS API Gateway: priced per-million-requests; high-TPS public APIs can run into thousands of USD/month.

Real-world examples

A food-delivery homepage needs user profile, nearby restaurants, active coupons, cart status, and recommendations. Five backend services. Without a gateway, the client fires five parallel requests, handles five error states, and wastes battery on cellular. With a gateway aggregating them, the client fires one — cleaner UX, fewer round trips, same backends.

Companies using the pattern at scale:

Netflix — Zuul fronts device APIs with a BFF per client form factor
Amazon — internal API layers across the service mesh; AWS API Gateway as a managed product
Uber — edge gateway handling auth and regional routing across microservices
Stripe — strict versioning and idempotency enforced at the API boundary

Limitations & when NOT to use one

The gateway solves real problems but adds its own:

Single point of failure if deployed as one node — must run clustered behind an L4 LB
Bottleneck risk — under-provisioned gateways saturate before services do
Blast radius — a bad config pushes bad behavior to all traffic instantly
Scope creep — business logic leaking into the gateway recreates a distributed monolith
Ops burden — self-hosted (Kong/Envoy) needs a team; managed (AWS/Apigee) gets expensive

Skip it when: you have one service, low traffic, or no team capacity to operate it. Add it when ≥3–4 backend services, multiple client types, or public API surface creates duplicate auth/routing logic across clients.

Popular tools

Open source: Kong, NGINX (with modules), Envoy, Traefik, Spring Cloud Gateway, Zuul.
Managed: AWS API Gateway, Google Apigee, Azure API Management.

The golden rule

Architecture should match scale. If you're a one-service startup, a gateway is ceremony. If you're running eight backends behind three client types, not having one means you're paying the complexity tax in every client codebase — forever.

Sources: microservices.io, Azure Architecture Center, Netflix/zuul, Kong, IBM.