TL;DR

An API Gateway is a single entry point that sits between clients and backend services. It centralizes authentication, rate limiting, SSL termination, routing, and response aggregation so clients don't have to deal with a dozen microservice URLs. Netflix runs 80+ Zuul 2 clusters handling over 1M requests/second. Kong benchmarks 50,000 TPS per node. Used everywhere in cloud-native — but not free: extra latency hop, single point of failure if misdeployed, and real operational cost. Skip it for a single-service app.

The problem: direct client-to-service communication

Imagine a mobile app that needs auth, profile, payments, notifications, and search. If the client calls each service directly, things fall apart fast:

  • Client manages many base URLs and retries per service
  • Authentication and token refresh logic gets duplicated in every client
  • Versioning becomes a cross-team coordination nightmare
  • Internal services are exposed to the public internet
  • Monitoring is fragmented across N backends

Every new service ships pain to every client team. This is the pattern the API Gateway exists to kill.

What an API Gateway actually does

At its core, the gateway is a Layer 7 reverse proxy that speaks HTTP (not just IPs and ports). One public endpoint fronts many internal services, and it handles cross-cutting concerns in one place:

  • Authentication & JWT validation — verify once, pass identity downstream
  • SSL/TLS termination — certificates live at the edge, not on every service
  • Routing — path/header-based routing to the right backend
  • Rate limiting & throttling — protect services from abusive clients and DDoS
  • Request aggregation — fan out to multiple services, return one response
  • Response caching — shield backends from repeat reads
  • Protocol translation — HTTP to gRPC, WebSocket, or internal RPC
  • API versioning/v1 vs /v2 routed to different backend stacks
  • Observability — central logs, traces, and metrics

The gateway becomes the traffic control layer — the one place where platform-wide policy is enforced.

Gateway vs Load Balancer — don't confuse them

This trips up a lot of engineers. They solve different problems:

AspectLoad BalancerAPI Gateway
OSI LayerL4 (IP/port) or basic L7L7 (application, HTTP-aware)
Primary jobDistribute traffic across identical instancesApply API policy & routing
Knows aboutHosts and portsEndpoints, auth, quotas, versions
Typical featuresHealth checks, failover, round-robinJWT auth, rate limits, transforms, aggregation

In production you usually run both:

Client → L4 Load Balancer → API Gateway cluster → Microservices

The load balancer gives the gateway high availability. The gateway does the API work. Kong, for example, runs on top of NGINX — NGINX is the engine, Kong is the policy layer.

Technical facts & numbers

  • Netflix Zuul 2: 80+ clusters, ~100 backend service clusters, over 1 million requests per second.
  • Kong 3.6: >50,000 TPS per node on AWS c6g instances.
  • Latency tax: a well-provisioned gateway adds ~1–10ms per request.
  • Netflix BFF split: three API surfaces — Signup API (non-members), Discovery API (search/recommendations), Play API (streaming/licensing) — each a Backend-for-Frontend tuned for its clients.
  • AWS API Gateway: priced per-million-requests; high-TPS public APIs can run into thousands of USD/month.

Real-world examples

A food-delivery homepage needs user profile, nearby restaurants, active coupons, cart status, and recommendations. Five backend services. Without a gateway, the client fires five parallel requests, handles five error states, and wastes battery on cellular. With a gateway aggregating them, the client fires one — cleaner UX, fewer round trips, same backends.

Companies using the pattern at scale:

  • Netflix — Zuul fronts device APIs with a BFF per client form factor
  • Amazon — internal API layers across the service mesh; AWS API Gateway as a managed product
  • Uber — edge gateway handling auth and regional routing across microservices
  • Stripe — strict versioning and idempotency enforced at the API boundary

Limitations & when NOT to use one

The gateway solves real problems but adds its own:

  • Single point of failure if deployed as one node — must run clustered behind an L4 LB
  • Bottleneck risk — under-provisioned gateways saturate before services do
  • Blast radius — a bad config pushes bad behavior to all traffic instantly
  • Scope creep — business logic leaking into the gateway recreates a distributed monolith
  • Ops burden — self-hosted (Kong/Envoy) needs a team; managed (AWS/Apigee) gets expensive

Skip it when: you have one service, low traffic, or no team capacity to operate it. Add it when ≥3–4 backend services, multiple client types, or public API surface creates duplicate auth/routing logic across clients.

Popular tools

Open source: Kong, NGINX (with modules), Envoy, Traefik, Spring Cloud Gateway, Zuul.
Managed: AWS API Gateway, Google Apigee, Azure API Management.

The golden rule

Architecture should match scale. If you're a one-service startup, a gateway is ceremony. If you're running eight backends behind three client types, not having one means you're paying the complexity tax in every client codebase — forever.

Sources: microservices.io, Azure Architecture Center, Netflix/zuul, Kong, IBM.