Chapter 2: Core Building Blocks of System Design

2.1 Networking & Communication

Networking Fundamentals

Every system you will ever design depends on computers talking to each other over a network.

Before you can make smart decisions about load balancers, CDNs, or caching strategies, you need to understand how that communication actually works at a fundamental level.

**The OSI Model and TCP/IP Stack**

When two computers exchange data, that data passes through several layers of processing.

The OSI (Open Systems Interconnection) model describes seven of those layers, from the physical cables at the bottom to the application you interact with at the top.

In practice, most engineers work with a simplified version called the TCP/IP model, which condenses those seven layers into four.

TCP/IP LayerWhat It DoesProtocolsOSI Equivalent
ApplicationHandles user-facing communication: web pages, emails, file transfersHTTP, HTTPS, FTP, SMTP, DNSLayers 5, 6, 7
TransportManages end-to-end delivery, breaks data into segments, ensures reliability or speedTCP, UDPLayer 4
InternetRoutes packets across networks using IP addressesIP, ICMPLayer 3
Network AccessMoves raw bits across physical media: cables, Wi-Fi, fiberEthernet, Wi-FiLayers 1, 2

You do not need to memorize every protocol at every layer. What you need is a mental model of how data flows.

When a user opens your app and taps "send message," that message travels down these layers on the sender's device, crosses the network as packets, and travels back up the layers on the receiving server. Each layer adds its own headers and handles its own responsibilities.

For system design, two layers matter most.

The transport layer is where you choose between TCP (reliable, ordered delivery) and UDP (fast, no guarantees).

The application layer is where you choose between HTTP, WebSockets, gRPC, and other protocols that define how your services communicate. Almost every design decision in this chapter lives at one of these two layers.

**DNS: How Domain Name Resolution Works**

When you type "example.com" into a browser, your computer does not know where to send that request. It only understands IP addresses like 93.184.216.34. DNS (Domain Name System) is the mechanism that translates human-readable domain names into those IP addresses.

The resolution process works in steps. Your browser first checks its own cache.

If it does not have the answer, it asks your operating system's cache. If that misses too, the request goes to a DNS resolver, usually operated by your internet service provider or a public resolver like Google (8.8.8.8) or Cloudflare (1.1.1.1).

The resolver then walks through the DNS hierarchy. It asks a root name server, which points it to the top-level domain server for ".com." That server points it to the authoritative name server for "example.com."

The authoritative server finally returns the actual IP address.

The resolver caches the result and sends it back to your browser.

This entire process usually takes 20 to 80 milliseconds. But because of aggressive caching at every level, most DNS lookups for popular domains resolve from cache in under 5 milliseconds.

**DNS Record Types**

DNS does not just map names to IP addresses. Different record types serve different purposes.

Record TypeWhat It DoesExample
AMaps a domain to an IPv4 addressexample.com → 93.184.216.34
AAAAMaps a domain to an IPv6 addressexample.com → 2606:2800:220:1:...
CNAMECreates an alias pointing to another domain namewww.example.com → example.com
MXSpecifies which mail servers handle email for the domainexample.com → mail.example.com
NSIdentifies the authoritative name servers for the domainexample.com → ns1.example.com
TXTStores arbitrary text, often used for domain verification and email security (SPF, DKIM)example.com → "v=spf1 include:..."

In system design, you will encounter DNS most often when discussing how traffic reaches your servers. Load balancing can happen at the DNS level by returning different IP addresses for the same domain. Failover can use DNS by changing records to point away from unhealthy servers. CDNs use DNS to route users to the nearest edge server.

**DNS Caching and TTL (Time to Live)**

Every DNS record comes with a TTL value measured in seconds. The TTL tells resolvers and clients how long they can cache that record before checking again.

A TTL of 3600 means the record can be cached for one hour. During that hour, no matter how many times someone looks up your domain, the cached answer gets returned instantly. After the hour expires, the next lookup goes back through the full resolution process.

Choosing the right TTL is a trade-off.

A long TTL (hours or days) reduces DNS lookup latency and lowers the load on your authoritative name servers. But it also means changes propagate slowly.

If you update your IP address and your TTL is 24 hours, some users will keep hitting the old address for up to a day.

A short TTL (30 to 60 seconds) lets changes propagate quickly, which is useful for failover scenarios. But it increases the load on DNS servers and adds a small amount of latency to every lookup once the cache expires.

Most production systems use a moderate TTL of 300 seconds (5 minutes) as a baseline.

Before a planned migration or failover, engineers often lower the TTL to 60 seconds a day or two in advance, so that when the actual switch happens, the old cached records expire quickly.

_DNS Resolution_

## Beginner Mistake to Avoid

New engineers sometimes forget that DNS propagation is not instant.

You cannot change a DNS record and expect every user on the planet to see the new address immediately. Old cached entries will persist until their TTL expires, and some ISP resolvers are known to ignore TTLs and cache records longer than specified.

Always plan for a propagation window when making DNS changes.

Interview-Style Question

> Q: A user reports that your website loads slowly. The first request takes 500ms, but subsequent requests take only 50ms. What might explain this?

>A: The most likely explanation is DNS resolution and connection setup overhead on the first request. The initial request requires a DNS lookup (which could take 50 to 100ms), a TCP handshake (one round trip), and a TLS handshake for HTTPS (one to two additional round trips). Subsequent requests reuse the established connection and the cached DNS result, skipping all that overhead. This is why persistent connections (HTTP keep-alive) and DNS pre-fetching matter for performance.

**KEY TAKEAWAYS**

* The TCP/IP model has four layers. For system design, the transport layer (TCP vs. UDP) and application layer (HTTP, WebSocket, gRPC) are where most of your decisions live. * DNS translates domain names into IP addresses through a hierarchical lookup process that relies heavily on caching at every level. * Different DNS record types serve different purposes. A records map to IPv4 addresses, CNAME records create aliases, MX records route email.

* TTL controls how long DNS records stay cached. Long TTLs reduce latency but slow down changes. Short TTLs enable fast failover but increase DNS traffic. * DNS is not instant. Always account for propagation delays when making DNS changes in production.

## Communication Protocols

Now that you understand how computers find each other on a network, the next question is: how do they actually talk?

The protocol they use determines everything from reliability to speed to whether communication flows in one direction or both.

**HTTP/HTTPS: Request/Response Model, Verbs, Status Codes**

HTTP (Hypertext Transfer Protocol) is the foundation of almost all web communication. It follows a simple request/response model: a client sends a request, the server processes it, and sends back a response. Every time you load a web page, submit a form, or call an API, HTTP is doing the work underneath.

HTTPS is HTTP with encryption added via TLS (Transport Layer Security). The data traveling between client and server is encrypted so that anyone intercepting the traffic cannot read it. In 2026, there is no good reason to use plain HTTP in production. HTTPS is the default.

HTTP requests use verbs (also called methods) that indicate the intended action.

VerbPurposeIdempotent?
GETRetrieve dataYes
POSTCreate a new resourceNo
PUTReplace an existing resource entirelyYes
PATCHPartially update a resourceNo (generally)
DELETERemove a resourceYes

HTTP responses come with status codes that tell the client what happened.

RangeMeaningCommon Examples
2xxSuccess200 OK, 201 Created, 204 No Content
3xxRedirect301 Moved Permanently, 304 Not Modified
4xxClient error400 Bad Request, 401 Unauthorized, 404 Not Found, 429 Too Many Requests
5xxServer error500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable

Knowing these codes matters because they shape how clients handle responses. A 429 tells the client to slow down. A 503 tells it to retry later. A 401 tells it to re-authenticate.

**HTTP/2 and HTTP/3 (QUIC): Multiplexing and Performance Gains**

HTTP/1.1 has a fundamental limitation: each TCP connection handles one request at a time. If a web page needs 50 resources (images, scripts, stylesheets), the browser either opens many parallel TCP connections or waits for each request to finish before sending the next one. Both approaches are wasteful.

HTTP/2 solves this with multiplexing. A single TCP connection can carry multiple requests and responses simultaneously, interleaved as binary frames. The result is faster page loads and more efficient use of network resources. HTTP/2 also introduces header compression, which reduces the overhead of sending repetitive headers with every request.

HTTP/3 takes this a step further by replacing TCP with QUIC, a protocol built on top of UDP. QUIC eliminates a problem called head-of-line blocking, where a single lost packet in TCP stalls all the multiplexed streams on that connection. With QUIC, each stream is independent, so a lost packet only affects its own stream. QUIC also combines the transport and TLS handshakes into a single round trip, making initial connections faster.

For system design, the key takeaway is this: HTTP/2 and HTTP/3 reduce latency and improve connection efficiency. If you are designing a system that serves many small resources or handles many concurrent API calls, these protocols give you meaningful performance gains without changing your application code.

**TCP vs. UDP: Reliability vs. Speed Trade-offs**

TCP and UDP are the two transport-layer protocols you will encounter in every system design discussion.

TCP (Transmission Control Protocol) guarantees that data arrives in order, without duplicates, and without corruption. It does this through mechanisms like acknowledgments, retransmissions, and sequence numbers. The cost is overhead: establishing a connection requires a three-way handshake, and lost packets trigger retransmission delays.

UDP (User Datagram Protocol) makes no such guarantees. It sends packets and forgets about them. No handshake, no acknowledgments, no retransmissions.

The benefit is speed and simplicity. The cost is that packets can arrive out of order, get duplicated, or disappear entirely.

AspectTCPUDP
ReliabilityGuaranteed delivery, orderedBest-effort, no guarantees
ConnectionRequires handshakeConnectionless
SpeedSlower due to overheadFaster, minimal overhead
Use casesWeb traffic, APIs, file transfers, emailVideo streaming, online gaming, DNS lookups, VoIP

The choice depends on what your system values more.

If you are building an API or a payment system, data integrity is non-negotiable, so you use TCP. If you are building a live video stream where a dropped frame is better than a delayed frame, UDP is the right choice.

**WebSockets: Full-Duplex Real-Time Communication**

Standard HTTP is a one-way street initiated by the client.

The client sends a request, the server responds.

If the server has new data for the client, it has to wait until the client asks for it. This works fine for loading web pages, but it fails for real-time applications where the server needs to push updates immediately.

WebSockets solve this by establishing a persistent, full-duplex connection between client and server. After an initial HTTP handshake that "upgrades" the connection, both sides can send messages to each other at any time without waiting for a request.

The connection stays open until either side closes it.

This makes WebSockets ideal for chat applications, live dashboards, collaborative editing, multiplayer games, and any scenario where both the client and server need to send data unpredictably.

The trade-off is that WebSocket connections are stateful. Each connection consumes server resources (memory, file descriptors) for as long as it stays open.

A server handling 100,000 concurrent WebSocket connections needs significantly more resources than one handling 100,000 short-lived HTTP requests per minute.

You need to plan for connection management, heartbeats to detect dead connections, and reconnection logic on the client side.

**Server-Sent Events (SSE): One-Way Push from Server**

Sometimes you need the server to push updates to the client, but you do not need the client to send data back through the same channel.

Server-Sent Events handle this case.

SSE uses a standard HTTP connection that stays open.

The server streams events to the client as they happen.

The client listens passively. If the connection drops, the browser automatically reconnects and can resume from where it left off using a last-event-ID mechanism.

SSE works well for live feeds, stock tickers, notification streams, and progress updates. It is simpler than WebSockets because it uses plain HTTP (no protocol upgrade needed), works through most firewalls and proxies without issues, and has built-in reconnection.

The limitation is the one-way nature.

If your application needs bidirectional communication, WebSockets are the right choice. If you only need server-to-client updates, SSE is simpler and often sufficient.

**Long Polling vs. Short Polling**

Before WebSockets and SSE existed, engineers used polling to simulate real-time updates.

Short polling means the client sends a request to the server at fixed intervals, say every 3 seconds, asking "do you have anything new for me?"

If the server has new data, it sends it.

If not, it sends an empty response.

The problem is obvious: most of those requests return nothing, wasting bandwidth and server resources.

And there is always a delay between when data becomes available and when the next poll happens.

Long polling is a smarter version. The client sends a request, and the server holds that request open until it has new data to send back. Once it responds, the client immediately sends another request, and the cycle repeats. This reduces the number of empty responses and gets closer to real-time behavior.

AspectShort PollingLong PollingWebSocketsSSE
DirectionClient to serverClient to serverBidirectionalServer to client
LatencyDepends on poll intervalNear real-timeReal-timeReal-time
ConnectionNew connection per pollHeld open, reconnectsPersistentPersistent
ComplexityLowMediumHigherLow
Best forSimple, low-frequency updatesNear-real-time without WebSocket supportChat, gaming, collaborationLive feeds, notifications

Long polling still has a place in systems where WebSocket support is limited, but for most modern applications, WebSockets or SSE are better choices.

**gRPC and Protocol Buffers: High-Performance RPC**

gRPC is a framework built by Google for service-to-service communication. Instead of sending JSON over HTTP like a REST API, gRPC uses Protocol Buffers (protobuf) as its serialization format and HTTP/2 as its transport.

Protocol Buffers are a binary serialization format. You define your data structure in a `.proto` file, and the protobuf compiler generates code in your chosen language to serialize and deserialize that data. Because it is binary rather than text-based like JSON, protobuf messages are significantly smaller and faster to parse.

gRPC supports four communication patterns: unary (one request, one response, like REST), server streaming (one request, multiple responses), client streaming (multiple requests, one response), and bidirectional streaming (both sides send multiple messages).

This makes gRPC excellent for internal microservice communication where performance matters and both sides are systems you control.

The trade-off is that gRPC is harder to debug than REST (you cannot just read the messages in a browser), requires protobuf schema management, and has limited browser support without a proxy layer like gRPC-Web.

Interview-Style Question

> Q: You are building a real-time notifications system. Users should see notifications appear instantly without refreshing. Which communication protocol would you choose and why?

> A: For a notification system where the server pushes updates and the client mostly listens, Server-Sent Events would be the first choice. SSE is simpler than WebSockets, works over standard HTTP, and has built-in reconnection logic. If the system also needs the client to acknowledge notifications or send read receipts through the same connection, WebSockets become the better option because of their bidirectional nature. Long polling would be a fallback for environments where neither SSE nor WebSockets are supported.

**KEY TAKEAWAYS**

* HTTP is request/response. Know your verbs (GET, POST, PUT, DELETE) and status codes (2xx success, 4xx client error, 5xx server error). * HTTP/2 adds multiplexing over a single connection. HTTP/3 (QUIC) eliminates head-of-line blocking by using UDP instead of TCP.

* TCP guarantees delivery and order but costs speed. UDP is fast but unreliable. Choose based on whether your system prioritizes data integrity or low latency. * WebSockets provide persistent, bidirectional, real-time communication. SSE provides one-way server-to-client push. Choose based on whether you need two-way communication. * gRPC with Protocol Buffers is ideal for high-performance service-to-service communication. REST is better for public-facing APIs and browser clients.

## API Design Paradigms

APIs are how your system's components talk to each other and how external clients interact with your services.

The API design choices you make will affect performance, developer experience, and how easily your system evolves over time.

**REST: Principles, Resource Modeling, HATEOAS**

REST (Representational State Transfer) is the most widely used API paradigm on the web. It models everything as resources, each identified by a URL, and uses standard HTTP verbs to operate on those resources.

A well-designed REST API for a bookstore might look like this:

OperationMethodEndpoint
List all booksGET/api/books
Get a specific bookGET/api/books/42
Create a new bookPOST/api/books
Update a bookPUT/api/books/42
Delete a bookDELETE/api/books/42

REST has a few guiding principles. It is stateless, meaning each request contains all the information the server needs to process it.

There is no session stored on the server between requests.

Resources are represented in a standard format, usually JSON.

And the uniform interface (using HTTP verbs consistently) makes APIs predictable.

HATEOAS (Hypermedia As The Engine Of Application State) is a REST principle where the server includes links in its responses that tell the client what actions are available next. In practice, very few APIs implement full HATEOAS.

Most production REST APIs follow the resource-and-verbs convention without the hypermedia links, and that works fine.

**GraphQL: Schema-First, Query Flexibility, Over-Fetching Solutions**

GraphQL, developed by Facebook, takes a fundamentally different approach. Instead of multiple endpoints for different resources, GraphQL exposes a single endpoint. The client sends a query describing exactly what data it wants, and the server returns exactly that data.

With REST, if you need a user's name and their last three orders, you might need two API calls: one to `/api/users/42` and another to `/api/users/42/orders?limit=3`. The user endpoint might return 30 fields when you only need the name. That extra data is called over-fetching, and it wastes bandwidth.

With GraphQL, you send a single query that requests only the fields you need, and you get one response with exactly that data. No over-fetching. No multiple round trips.

The trade-off is complexity on the server side. GraphQL requires a schema definition, a resolver layer, and careful attention to performance because clients can craft expensive queries that join many nested resources. Without safeguards like query depth limits and complexity scoring, a single malicious or careless query can bring your server to its knees.

**RPC (Remote Procedure Call): gRPC, Thrift, Avro**

RPC-style APIs take yet another approach. Instead of thinking in terms of resources and verbs, RPC thinks in terms of actions. You call a function on a remote server as if it were a local function call.

gRPC (covered in section 3.2) is the most popular modern RPC framework. It uses Protocol Buffers for serialization and HTTP/2 for transport. Apache Thrift (developed by Facebook) and Avro (developed within the Hadoop ecosystem) are alternatives with similar goals but different serialization formats and ecosystem strengths.

RPC excels at internal service-to-service communication where both sides share a schema and performance matters. The function-call model feels natural for engineers building backend systems. But RPC is less intuitive for public-facing APIs where REST's resource model maps more naturally to how external developers think about your system.

**REST vs. GraphQL vs. RPC: When to Use Which**

CriteriaRESTGraphQLRPC (gRPC)
Best forPublic APIs, CRUD operations, web clientsMobile apps, complex data needs, varied clientsInternal microservice communication
Data fetchingFixed response structure per endpointClient specifies exact fields neededDefined by protobuf schema
PerformanceGood, JSON over HTTPVaries, can be expensive without safeguardsExcellent, binary \+ HTTP/2
Learning curveLow, familiar to most developersMedium, requires schema and resolver setupMedium, requires protobuf tooling
Browser supportExcellentExcellentLimited, needs gRPC-Web proxy
CachingEasy (HTTP caching works naturally)Harder (single endpoint, POST requests)Requires custom implementation

Many large systems use all three. REST for the public API that third-party developers consume.

GraphQL for the mobile and web clients that need flexible data fetching. gRPC for internal communication between backend services where speed matters.

**API Versioning Strategies**

APIs evolve. You will add new fields, deprecate old endpoints, and change response formats. The question is how you handle these changes without breaking existing clients.

Three common strategies exist. URI versioning puts the version in the URL: `/api/v1/users` and `/api/v2/users`. This is the simplest and most visible approach.

Header versioning uses a custom header like `API-Version: 2` to specify the version, keeping URLs clean. Query parameter versioning adds `?version=2` to the request.

URI versioning is the most widely adopted in practice because it is explicit and easy to understand. The trade-off is URL proliferation: you end up maintaining multiple versions of endpoints simultaneously, and routing logic can get messy as versions accumulate.

Whatever strategy you choose, have a deprecation policy.

Give clients a timeline (usually 6 to 12 months) to migrate to the new version.

Log which clients still use old versions.

Send deprecation warnings in response headers.

Do not surprise developers by removing endpoints overnight.

**API Pagination: Cursor-Based, Offset-Based, Keyset**

When an API endpoint returns a large collection (think: all posts in a feed, all products in a catalog), you cannot return everything in one response. Pagination breaks the results into manageable pages.

Offset-based pagination is the simplest: `GET /api/posts?offset=20&limit=10` returns posts 21 through 30\. The problem is performance. To get offset 10,000, the database has to scan and skip 10,000 rows. For large datasets, this gets slow.

Cursor-based pagination uses an opaque token (the cursor) that points to a specific position in the result set: `GET /api/posts?cursor=abc123&limit=10`. The server knows how to resume from that cursor without scanning from the beginning. This is efficient and consistent even when new items are added, but cursors cannot easily jump to an arbitrary page.

Keyset pagination is similar to cursor-based but uses actual column values instead of opaque tokens: `GET /api/posts?created_after=2026-03-25T10:00:00&limit=10`. This works well when results are sorted by a unique, indexed column.

StrategyProsConsBest For
Offset-basedSimple, supports "jump to page N"Slow for large offsets, inconsistent with insertsSmall datasets, admin panels
Cursor-basedEfficient, consistentCannot jump to arbitrary pagesInfinite scroll, feeds, timelines
KeysetEfficient, uses natural sort keysRequires a unique, sortable columnTime-sorted data, logs

**Idempotency and Retry Semantics**

When a client does not get a response, it often retries the request. But what if the server actually processed the first request and the response was lost?

Without idempotency, the retry could create a duplicate: a user gets charged twice, a message gets sent twice, an order gets placed twice.

An idempotent operation produces the same result no matter how many times you execute it. GET and DELETE are naturally idempotent. POST is not. If you POST to create an order and retry, you might create two orders.

The standard solution is an idempotency key.

The client generates a unique key (usually a UUID) and sends it with the request.

The server stores the key and its result.

If the same key arrives again, the server returns the stored result instead of processing the request again.

This pattern is essential for any operation involving money, inventory, or state changes where duplicates cause real harm.

**API Rate Limiting and Throttling**

Rate limiting controls how many requests a client can make within a given time window. Without it, a single misbehaving client (or an attacker) can overwhelm your servers and degrade the experience for everyone else.

Common rate limiting strategies include fixed window (100 requests per minute, resetting at the top of each minute), sliding window (100 requests in any rolling 60-second period), and token bucket (tokens are added at a fixed rate; each request consumes a token; when tokens run out, requests are rejected).

When a client exceeds the limit, the server responds with HTTP 429 (Too Many Requests) and typically includes a `Retry-After` header telling the client when it can try again.

Rate limits serve multiple purposes: protecting your infrastructure from overload, ensuring fair usage across clients, preventing abuse and denial-of-service attacks, and managing costs for downstream services you pay per-call.

Interview-Style Question

> Q: You are designing a public API for a social media platform. Should you use REST, GraphQL, or gRPC?

> A: For a public API consumed by third-party developers, REST is the strongest choice. It uses familiar HTTP conventions, works natively in every programming language and browser, benefits from standard HTTP caching, and has a low learning curve. Developers expect public APIs to be RESTful. GraphQL could be offered as an additional option for mobile clients that need flexible data fetching, but it would not replace the REST API. gRPC would not be suitable here because of limited browser support and the requirement for protobuf tooling, which creates friction for external developers.

_API Paradigm Decision Tree_

**KEY TAKEAWAYS**

* REST models data as resources with HTTP verbs. It is the default for public-facing APIs because of its simplicity and widespread support.

* GraphQL lets clients request exactly the data they need, eliminating over-fetching. It adds server-side complexity and requires safeguards against expensive queries. * gRPC with Protocol Buffers excels at internal microservice communication where performance and strong typing matter. * Use cursor-based or keyset pagination for large datasets. Offset-based pagination degrades badly at scale. * Idempotency keys prevent duplicate operations when clients retry failed requests. They are essential for any operation involving money or state changes. * Rate limiting protects your system from abuse and overload. Always return HTTP 429 with a Retry-After header when limits are exceeded.

## API Gateway

As your system grows beyond a single service, managing how clients interact with your backend becomes a problem on its own.

An API gateway sits between your clients and your services, acting as the single entry point for all external requests.

**What an API Gateway Does**

An API gateway handles the cross-cutting concerns that every service needs but no individual service should implement on its own.

Routing: When a client sends a request to `/api/users/42`, the gateway knows which backend service handles user-related requests and forwards the call there. When the same client requests `/api/orders`, the gateway routes that to a different service. The client does not need to know which services exist or where they live. It talks to one address, and the gateway figures out the rest.

Authentication and authorization: Instead of every service independently verifying JWT tokens or API keys, the gateway handles authentication once at the edge. It validates the token, extracts user identity, and passes that information downstream. The backend services trust the gateway and focus on their business logic. This centralizes security policy in one place rather than scattering it across dozens of services.

Rate limiting: The gateway enforces rate limits before requests even reach your services. If a client exceeds their allowed QPS, the gateway rejects the request with a 429 status code. Your backend services never see the excess traffic.

Request and response transformation: Sometimes the client needs the data in a different format than the service produces. The gateway can transform responses, aggregate data from multiple services into a single response, or translate between protocols (for example, accepting a REST call from a browser and converting it to a gRPC call to an internal service).

Monitoring and logging: Because all traffic flows through the gateway, it becomes a natural place to collect metrics: request count, latency, error rates, and traffic patterns per service, per client, per endpoint. This gives you a unified view of how your entire API surface is performing.

_API Gateway_

### API Gateway Patterns: BFF (Backend for Frontend)

Not all clients need the same data or the same response format.

A mobile app on a slow cellular connection needs a compact, minimal response.

A web dashboard needs richer data with nested relationships. A third-party integration needs a stable, versioned API.

The Backend for Frontend (BFF) pattern addresses this by creating separate gateway layers for different client types.

You might have one BFF for your mobile clients that aggregates data from three services into a single lightweight response, another BFF for your web application that returns more detailed data, and a third BFF for your public API that follows strict versioning.

Each BFF is tailored to the needs of its client.

The mobile BFF might strip unnecessary fields and compress images.

The web BFF might include more metadata and support pagination differently. This keeps each gateway layer focused and prevents a single gateway from becoming a bloated translation layer trying to serve every client type simultaneously.

The trade-off is more code to maintain. Three BFFs mean three codebases that need updates when backend services change.

Teams typically mitigate this by having the same team that builds a client also own its BFF, since they best understand what that client needs.

**Popular API Gateways**

Several battle-tested API gateways exist, each with strengths in different areas.

Kong is an open-source gateway built on Nginx and OpenResty. It has a plugin architecture that makes it extensible for authentication, rate limiting, logging, and request transformation. Kong runs well on-premises and in cloud environments, and its plugin ecosystem is one of the largest.

AWS API Gateway is a fully managed service for teams running on Amazon Web Services. It handles provisioning, scaling, and monitoring automatically. It integrates tightly with other AWS services like Lambda, IAM, and CloudWatch. The trade-off is vendor lock-in: moving away from AWS API Gateway later requires significant rework.

Nginx has been used as a reverse proxy and basic API gateway for decades. With its Plus (commercial) version or the open-source unit, it handles routing, SSL termination, and load balancing efficiently. It is lightweight and fast but requires more manual configuration compared to purpose-built gateways like Kong.

Envoy is a high-performance proxy originally built by Lyft. It is commonly used as the data plane in service mesh architectures (like Istio) but also functions well as a standalone API gateway. Its strengths are observability, advanced load balancing, and support for both HTTP and gRPC traffic.

GatewayTypeStrengthsBest For
KongOpen source / commercialPlugin ecosystem, extensibilityMulti-cloud, on-premises
AWS API GatewayManaged serviceZero ops, AWS integrationAWS-native architectures
NginxOpen source / commercialPerformance, maturity, flexibilityTeams with strong ops experience
EnvoyOpen sourceObservability, gRPC support, service meshMicroservice architectures

**Beginner Mistake to Avoid**

New engineers sometimes make the API gateway too smart. They put business logic in the gateway: data validation, complex aggregation, conditional routing based on request body content. This turns the gateway into a monolith by another name.

The gateway should handle cross-cutting infrastructure concerns like auth, rate limiting, and routing.

Business logic belongs in the services behind it.

If you find yourself writing if/else chains in your gateway configuration, you have gone too far.

Interview-Style Question

> Q: Your system has 15 microservices. Mobile clients currently call 4 different services to load the home screen. How would you improve this?

> A: Introduce an API gateway with a BFF layer for mobile clients. The mobile BFF accepts a single request from the client, fans out to the four backend services in parallel, aggregates the responses, strips any unnecessary data to minimize payload size, and returns a single consolidated response. This reduces the number of round trips from the mobile device (which may be on a slow or unstable connection) from four to one, significantly improving load time and reducing battery usage from multiple network calls.

**KEY TAKEAWAYS**

* An API gateway is the single entry point for all client requests. It handles routing, authentication, rate limiting, transformation, and monitoring.

* Centralizing cross-cutting concerns in the gateway keeps individual services focused on business logic and prevents scattered security implementations. * The BFF (Backend for Frontend) pattern creates separate gateway layers for different client types, optimizing each for its specific needs. * Choose your gateway based on your infrastructure: managed services like AWS API Gateway for cloud-native teams, Kong or Envoy for multi-cloud or on-premises deployments, Nginx for teams that want lightweight control. * Keep the gateway thin. Route traffic and enforce policies there. Put business logic in the services, not the gateway.