3.4 System Architecture Patterns - The System Design Interview Handbook

Monolithic Architecture

A monolith is a single application where all functionality lives in one codebase, runs in one process, and deploys as one unit.

The user interface, business logic, data access, background jobs, and API endpoints are all bundled together.

When you deploy, you deploy everything. When you scale, you scale everything.

When Monoliths Make Sense

Monoliths get a bad reputation in an era obsessed with microservices, but they remain the right choice for many situations.

A small team building a new product should almost always start with a monolith.

When you have three to ten engineers, one codebase means everyone understands the entire system.

Debugging is simple because you can trace a request through the code without crossing network boundaries.

Deployments are a single artifact. Refactoring is straightforward because all the code is in one place, and your IDE can rename a function across the entire project in seconds.

Monoliths also make sense when your domain is not well understood yet. If you are still figuring out where the boundaries between features lie, splitting into services too early means you will draw the boundaries in the wrong places. Redrawing service boundaries after the fact requires migrating data between databases, rewriting API contracts, and redeploying multiple services. Redrawing module boundaries within a monolith requires moving files between folders.

The problems with monoliths appear at scale.

A codebase with 200 engineers making changes simultaneously becomes a coordination nightmare.

Merge conflicts multiply. Deployment frequency slows because one team's broken feature blocks everyone else's release.

A bug in the notification module can crash the payment processing module because they share the same process. Build times stretch into tens of minutes, and test suites take hours.

But these problems appear only when the monolith has grown large and the team has grown with it. For most startups and small teams, a well-structured monolith outperforms a poorly implemented microservices architecture every single time.

Modular Monolith: The Middle Ground

A modular monolith is a single deployable application that is internally organized into well-defined, loosely coupled modules. Each module owns a specific business domain (users, orders, payments, notifications) with clear interfaces between them. Modules do not reach into each other's internals. They communicate through defined APIs, function calls, or event dispatches within the same process.

The modular monolith gives you the simplicity of a single deployment with the organizational clarity of service boundaries. Each team can own a module without worrying about network calls, distributed debugging, or independent deployment pipelines.

If you eventually need to extract a module into a separate service, the clean boundaries make the extraction far easier than pulling apart a tangled monolith.

Shopify is a well-known example. They run one of the largest Ruby on Rails monoliths in the world but organize it into strictly enforced modules with clear dependency rules.

This approach lets them move fast with hundreds of engineers while avoiding the operational overhead of hundreds of microservices.

The modular monolith is the architecture pattern that does not get enough attention.

If you are outgrowing a simple monolith but not yet at the scale where microservices are justified, the modular monolith is almost always the right next step.

Monolith to Microservices Migration Strategies

When a monolith genuinely becomes a bottleneck (deployment takes hours, teams block each other constantly, a single module needs to scale independently), migration to microservices becomes necessary.

But the migration should be incremental, not a big-bang rewrite.

Strangler fig pattern: New features are built as separate services outside the monolith. Existing functionality is gradually extracted, one module at a time. A routing layer (API gateway or reverse proxy) directs traffic to the monolith for old features and to new services for migrated features. Over time, the monolith shrinks as more functionality moves into services. The name comes from a strangler fig tree that grows around a host tree, eventually replacing it entirely.

Branch by abstraction: Within the monolith, introduce an abstraction layer (an interface) in front of the module you want to extract. The monolith calls the interface, which initially delegates to the existing internal implementation. Build the new service. Once it is ready, switch the interface to call the external service instead of the internal implementation. If the new service fails, you can switch back to the internal implementation instantly.

Database-first decomposition: Before extracting a service, separate its data. Create a new database for the module's data and migrate reads and writes to the new database while the code still lives in the monolith. Once the data is cleanly separated, extracting the code into a separate service is much simpler because there are no shared database tables to untangle.

The worst migration strategy is the ground-up rewrite.

"Let's rebuild the whole thing as microservices from scratch" sounds appealing but almost always fails.

The new system takes years to reach feature parity with the old one.

Business requirements continue evolving during the rewrite, so the new system is targeting a moving goal.

And the old monolith still needs maintenance, meaning you are now maintaining two systems simultaneously.

Interview-Style Question

Q: Your company's monolithic application has grown to 500,000 lines of code with 80 engineers. Deployments take 45 minutes, and a bug in the search feature recently crashed the checkout flow. Leadership wants to move to microservices. How do you approach this?

A: Start by identifying the modules that cause the most pain. The search and checkout services should be separated first since a search bug should never crash checkout. Use the strangler fig pattern: put an API gateway in front of the monolith. Extract search into its own service first because it has the clearest boundary (separate data, separate concerns) and it directly caused the recent outage. Route search traffic to the new service and everything else to the monolith. Next, extract checkout and payment for the same isolation reasons. Leave less problematic modules in the monolith until there is a concrete reason to extract them. Do not set a goal of "zero monolith." Some functionality may remain in the monolith indefinitely, and that is fine.

3-stage Migration with Strangler Fig Pattern

KEY TAKEAWAYS

Monoliths are the right starting architecture for small teams and new products. They are simpler to build, debug, deploy, and reason about.
Modular monoliths provide service-like organization within a single deployable unit. They are the underappreciated middle ground between monoliths and microservices.
Migrate to microservices incrementally using the strangler fig pattern or branch by abstraction. Never attempt a full ground-up rewrite.
Extract modules that cause the most organizational or technical pain first. Leave stable, low-risk modules in the monolith until there is a concrete reason to move them.

Microservices Architecture

Microservices architecture structures an application as a collection of small, independently deployable services, each responsible for a specific business capability. Each service runs in its own process, communicates with other services over the network, and can be developed, deployed, and scaled independently.

Principles of Microservices

Microservices follow a set of principles that distinguish them from simply "splitting a monolith into smaller pieces."

Single responsibility: Each service does one thing and does it well. The order service handles order lifecycle. The payment service handles charging customers. The notification service handles sending emails and push notifications. If you cannot describe what a service does in one sentence, it is probably doing too much.
Independent deployability: Changing and deploying one service should not require changing or deploying any other service. This is the property that enables large organizations to move fast. 15 teams can deploy 15 services in a single day without coordinating with each other.
Decentralized data ownership: Each service owns its data and exposes it only through its API. No other service reads from or writes to another service's database directly. This eliminates the tight coupling that shared databases create.
Designed for failure: In a microservices architecture, network calls replace function calls. Networks are unreliable. Services crash. Latency spikes happen. Every service must handle the failure of its dependencies gracefully through timeouts, retries, circuit breakers, and fallbacks.
Technology heterogeneity: Different services can use different programming languages, frameworks, and databases. The payment service can use Java with PostgreSQL while the recommendation service uses Python with Redis. Each team picks the best tools for their specific problem.

Service Boundaries and Domain-Driven Design (DDD)

The hardest question in microservices is where to draw the lines between services. Draw them in the wrong place, and you end up with services that need to call each other for every operation, defeating the purpose of independence.

Domain-Driven Design provides a framework for answering this question.

DDD identifies bounded contexts, areas of the business where a specific model applies and has clear meaning.

Within a bounded context, terms have precise definitions. Across bounded contexts, the same word might mean different things.

In an e-commerce platform, "order" means one thing in the ordering context (a customer's purchase with line items and shipping address) and something different in the fulfillment context (a set of items to pick, pack, and ship from a warehouse). These are two bounded contexts, and they make natural service boundaries.

Bounded contexts align with teams.

The ordering team owns the ordering service and its data model.

The fulfillment team owns the fulfillment service and its data model. They communicate through well-defined events or APIs, and each team can evolve its internal model without affecting the other.

Good service boundaries have high cohesion internally (everything within the service is closely related) and low coupling externally (the service depends minimally on other services). If two services constantly need to call each other to complete any operation, they are probably one bounded context that was incorrectly split.

Inter-Service Communication: Sync vs. Async

Services need to talk to each other. The communication pattern they use shapes the system's performance, reliability, and coupling characteristics.

Synchronous communication (HTTP/REST, gRPC) means the calling service sends a request and waits for a response. This is simple to implement and reason about. The calling service gets an immediate answer. But it creates temporal coupling: both services must be running at the same time. If the downstream service is slow or down, the caller is blocked or must handle the failure.
Asynchronous communication (message queues, event streaming) means the calling service sends a message and moves on without waiting for a response. The receiving service processes the message whenever it is ready. This decouples services in time: the producer and consumer do not need to be running simultaneously. It handles failures more gracefully because messages persist in the queue until a consumer processes them.

The practical guideline: use synchronous communication when the caller needs an immediate response to proceed (like checking inventory before confirming an order).

Use asynchronous communication when the caller does not need to wait for the result (like sending a confirmation email after an order is placed).

Most microservice systems use both. The user-facing request path uses synchronous calls for the operations the user needs to see immediately. Everything else goes through message queues or event streams.

Service Discovery: Consul, etcd, ZooKeeper, DNS-Based

In a microservices architecture, services run on multiple instances that come and go dynamically. Auto-scaling adds new instances. Deployments replace old instances with new ones. Failures remove instances. How does Service A know the current network addresses of Service B's instances?

Service discovery solves this problem. It maintains a registry of available service instances and their addresses. Services register themselves when they start and deregister when they stop. Clients query the registry to find healthy instances of the service they want to call.

Client-side discovery means the client queries the service registry directly and chooses which instance to call. The client handles load balancing. Netflix's Eureka follows this pattern.
Server-side discovery means the client sends its request to a load balancer or router, which queries the registry and forwards the request to an available instance. AWS ALB and Kubernetes Services follow this pattern.
DNS-based discovery uses DNS records to resolve a service name to one or more IP addresses. Kubernetes provides this natively: the service name payment-service.default.svc.cluster.local resolves to the current IP addresses of healthy payment service pods. DNS-based discovery is simple and widely compatible but updates are limited by DNS TTL.
Consul, etcd, and ZooKeeper provide more sophisticated discovery with health checking, key-value configuration, and real-time notifications when service instances change. Consul is the most purpose-built for service discovery, with HTTP and DNS interfaces, built-in health checks, and multi-datacenter support.

API Composition and Aggregation

When a client needs data from multiple services (a product page showing product details from the catalog service, pricing from the pricing service, reviews from the reviews service, and recommendations from the recommendation service), someone needs to aggregate those responses.

API gateway aggregation means the API gateway (or a BFF layer, covered in Part II, Lesson 1) calls the downstream services, combines their responses, and returns a single unified response to the client. This reduces the number of round trips the client makes and keeps aggregation logic out of the client code.
API composition service is a dedicated service whose only job is aggregating data from other services for specific use cases. A "product page composer" calls the catalog, pricing, reviews, and recommendations services in parallel, merges the results, and returns the composite response.

The challenge with API composition is latency.

The composite response is only as fast as the slowest downstream call.

Mitigations include calling downstream services in parallel rather than sequentially, setting aggressive timeouts so one slow service does not hold up the entire page, and using fallbacks (show the page without recommendations if the recommendation service is slow).

Data Management in Microservices: Database-Per-Service

The database-per-service pattern gives each microservice its own dedicated database.

The order service has its own PostgreSQL instance.

The product catalog has its own MongoDB instance.

The session service has its own Redis instance.

This pattern is foundational to microservice independence.

If two services share a database, changes to the schema by one team can break the other team's service. Scaling one service's data independently becomes impossible. And every service becomes coupled through the database, even if their code is separate.

The trade-off is that queries spanning multiple services become harder.

In a monolith, "show all orders with their product names" is a single SQL join.

In microservices with database-per-service, you either denormalize product names into the order service's database, call the product service's API to look up names (which adds latency), or use an event-driven approach where the product service publishes name changes and the order service updates its local copy.

Cross-service reporting and analytics are typically handled by an event-driven pipeline. Each service publishes its data changes as events.

A centralized data warehouse or analytics service consumes all events and builds a unified view for reporting. This keeps individual services independent while still enabling cross-domain analysis.

Interview-Style Question

Q: You are designing the backend for a food delivery app. What services would you create and how would they communicate?

A: Start by identifying bounded contexts. Core services: User Service (profiles, authentication), Restaurant Service (menus, hours, locations), Order Service (order lifecycle, status), Payment Service (charging, refunds), Delivery Service (driver matching, tracking, routing), and Notification Service (push, email, SMS). Communication: the order flow uses synchronous calls for the critical path. When a user places an order, the Order Service synchronously calls the Payment Service to charge the card (the user must know immediately if payment succeeded). After payment succeeds, the Order Service publishes an "OrderPlaced" event asynchronously. The Restaurant Service consumes this event to start preparing food. The Delivery Service consumes it to find a driver. The Notification Service consumes it to send a confirmation. Each service owns its own database. The Order Service stores orders in PostgreSQL. The Delivery Service stores driver locations in Redis with geospatial indexing. The Restaurant Service stores menus in MongoDB for schema flexibility.

Microservices Architecture for Food Delivery App

KEY TAKEAWAYS

Microservices are independently deployable services, each owning a specific business capability and its own data.
Draw service boundaries along bounded contexts from DDD. High cohesion within, low coupling between.
Use synchronous communication when the caller needs an immediate answer. Use asynchronous communication for everything else.
Service discovery keeps track of dynamic service instances. Consul, etcd, and DNS-based discovery are the common approaches.
Database-per-service is essential for independence but makes cross-service queries harder. Use events and denormalization to bridge the gap.
API composition aggregates data from multiple services. Call downstream services in parallel and set aggressive timeouts.

Event-Driven Architecture

Event-driven architecture (EDA) structures communication around events: facts about things that happened. Instead of services calling each other directly ("hey, process this payment"), they announce what happened ("an order was placed") and let interested services react independently.

Event Producers, Consumers, and Brokers

An event-driven system has three roles.

Producers generate events when something significant happens in their domain. The order service produces an "OrderPlaced" event. The payment service produces a "PaymentCompleted" event. The user service produces a "UserRegistered" event.
Consumers listen for events they care about and react accordingly. The notification service consumes "OrderPlaced" to send a confirmation email. The analytics service consumes "PaymentCompleted" to update revenue dashboards. The recommendation engine consumes "UserRegistered" to initialize a new user's profile.
Brokers (Kafka, RabbitMQ, Amazon EventBridge, Amazon SNS/SQS) sit between producers and consumers, receiving events from producers and delivering them to consumers. The broker handles persistence (so events survive restarts), delivery guarantees (at-least-once, exactly-once within scope), and routing (delivering events only to interested consumers).

Producers and consumers are fully decoupled.

The order service does not know which services consume its events. You can add a new consumer (a fraud detection service) without changing the order service. You can remove a consumer (an old analytics system) without affecting anything else. This decoupling is the central advantage of event-driven architecture.

Event Sourcing and CQRS

Event sourcing and CQRS are patterns that pair naturally with event-driven architecture.

In event sourcing, the event log is the source of truth.

Every state change is recorded as an immutable event.

Current state is derived by replaying the event log. Combined with event-driven architecture, the same events that represent state changes also drive inter-service communication.

The order service stores an "OrderPlaced" event in its event log and publishes it to the broker simultaneously.

The event serves dual purposes: internal state management and external communication.

CQRS separates the write model (which processes commands and produces events) from the read model (which consumes events and builds query-optimized views).

In an event-driven system, the write side publishes events through the broker, and one or more read-side consumers build materialized views tailored to specific query patterns.

The product catalog's write side handles product updates. Its read side builds a search-optimized Elasticsearch index, a mobile-friendly summary view, and a category browsing view, each as a separate consumer.

Not every event-driven system needs event sourcing or CQRS. These patterns add complexity that is only justified when you need a complete audit trail (event sourcing) or when your read and write patterns are dramatically different (CQRS).

Most event-driven systems use simple events for inter-service communication without event sourcing or CQRS.

Event-Driven APIs in Microservices

In a request-driven API, the client calls the server and gets a response. In an event-driven API, the client subscribes to events and receives notifications when something relevant happens.

WebSockets and SSE enable event-driven APIs between clients and servers.

A real-time dashboard subscribes to "MetricsUpdated" events and refreshes whenever new metrics are available.

A delivery tracking page subscribes to "DriverLocationUpdated" events and moves the pin on the map in real time.

Between microservices, event-driven APIs use the message broker.

Instead of the delivery service polling the order service every second to check for new orders, the delivery service subscribes to "OrderPlaced" events and reacts immediately when one arrives. This eliminates polling overhead and reduces latency.

Error Handling in Event-Driven Systems

Error handling in event-driven systems is harder than in synchronous systems because there is no direct response channel to report failures.

When a consumer fails to process an event, the standard approach is retry with backoff. The event stays in the queue or stream, and the consumer retries after a delay. If it fails again, the delay increases (exponential backoff). After a configured number of retries, the event moves to a dead letter queue for manual inspection.

For events that trigger multi-step workflows, the Saga pattern handles compensations when a step fails.

If the "OrderPlaced" event triggers payment processing and the payment fails, a "PaymentFailed" event is published, and downstream consumers react by cancelling inventory reservations and notifying the user.

Monitoring is essential. In a synchronous system, a failed request returns an error code immediately.

In an event-driven system, a failed event sits silently in a dead letter queue until someone looks at it.

Alert on DLQ depth, consumer lag, and event processing error rates.

Event-Driven vs. Request-Driven Microservices

The choice between event-driven and request-driven communication is not binary. Most systems use both, each where it fits best.

Request-driven (synchronous) communication works when the caller needs an immediate answer, when the operation is simple and fast, when strong consistency is required, and when the request-response pattern maps naturally to the interaction.

Event-driven (asynchronous) communication works when the caller does not need to wait, when multiple services need to react to the same trigger, when services should be decoupled from each other's availability, and when operations can be processed with a short delay.

Factor	Request-Driven	Event-Driven
Response needed?	Yes, immediately	No, or with acceptable delay
Coupling	Caller knows the callee	Producer does not know consumers
Failure handling	Caller handles it directly	Broker retains event for retry
Scaling	Caller and callee scale together	Each consumer scales independently
Debugging	Trace a single request path	Trace events across multiple consumers

Message-Driven vs. Event-Driven Architecture

These terms are often used interchangeably, but they describe subtly different communication styles.

Message-driven architecture sends directed messages to a specific destination. Service A sends a message to Service B's queue: "process this payment." The message has a specific recipient in mind. Messages are commands, instructions to do something.
Event-driven architecture broadcasts facts about what happened. The order service announces "an order was placed." It does not direct the message to any specific service. Any interested service can subscribe. Events are notifications, statements that something occurred.

The practical difference is in coupling and intent.

Message-driven systems still couple the sender to the receiver (Service A knows about Service B). Event-driven systems decouple them completely (the order service does not know or care who consumes its events).

Many systems combine both.

Commands flow through message-driven channels (the order orchestrator sends a "charge this card" command to the payment service's queue). Events flow through event-driven channels (the payment service publishes "PaymentCompleted" for any interested subscriber).

Interview-Style Question

Q: Your e-commerce system needs to update the search index, send a confirmation email, update analytics, and notify the warehouse when an order is placed. Should these be synchronous API calls or events?

A: Events. The user does not need to wait for the search index update, the email, the analytics write, or the warehouse notification. The order service publishes a single "OrderPlaced" event to a Kafka topic. Four independent consumers subscribe: the search indexer updates Elasticsearch, the notification service sends the email, the analytics service records the event, and the warehouse service queues the fulfillment. Each consumer processes the event at its own pace. If the analytics service is down, the event waits in Kafka until it recovers. If you later need a fifth reaction (like updating a loyalty points service), you add a new consumer without touching the order service.

KEY TAKEAWAYS

Event-driven architecture decouples services by communicating through events (facts about what happened) rather than direct calls.
Producers publish events, consumers react to them, and brokers handle delivery and persistence. Adding or removing consumers does not affect producers.
Event sourcing and CQRS pair naturally with EDA but are not required. Use them when you need audit trails or dramatically different read/write patterns.
Error handling in event-driven systems relies on retries, dead letter queues, and monitoring. Silent failures are the biggest risk.
Use request-driven communication when an immediate answer is needed. Use event-driven communication when multiple services should react independently.
Messages are directed commands to specific recipients. Events are broadcast facts that any interested service can consume. Most systems use both.

Serverless Architecture

Serverless computing lets you run code without managing servers. You write a function, deploy it, and the cloud provider handles provisioning, scaling, and infrastructure management. You pay only for the compute time your function actually uses, not for idle servers waiting for requests.

Function as a Service (FaaS): AWS Lambda, Azure Functions, GCP Cloud Functions

FaaS is the core of serverless architecture. You write a function that handles a specific trigger (an HTTP request, a message queue event, a file upload, a database change, a scheduled timer).

The cloud provider runs your function in response to that trigger, scales it automatically to match demand, and shuts it down when there is no traffic.

AWS Lambda is the most widely adopted FaaS platform. It supports multiple languages (Python, Node.js, Java, Go, .NET, Rust), integrates with virtually every AWS service, and can scale to thousands of concurrent executions in seconds. Maximum execution time is 15 minutes per invocation.
Azure Functions offers similar capabilities within the Microsoft Azure ecosystem. It supports a broader range of hosting models, including a dedicated plan where functions run on reserved instances (useful for avoiding cold starts).
GCP Cloud Functions (and Cloud Run for containerized workloads) provide Google Cloud's FaaS offering. Cloud Run blurs the line between FaaS and containers by running container images that scale to zero.

The defining characteristic of FaaS is that you do not think about servers. You do not provision instances, configure auto-scaling, manage operating system updates, or worry about capacity planning. The provider handles all of it.

Cold Starts and Performance Considerations

A cold start occurs when the platform needs to initialize a new instance of your function from scratch. This involves provisioning a container, loading your runtime (Node.js, Python, Java), loading your code and dependencies, and running any initialization logic. Cold starts typically add 100ms to 2 seconds of latency, depending on the runtime and the size of your deployment package.

Java and .NET functions tend to have longer cold starts than Python and Node.js due to their heavier runtimes.

Cold starts happen when a function has not been invoked recently and the platform has reclaimed its resources, when traffic spikes and the platform needs to create new instances beyond the currently warm pool, or on the first invocation after a deployment.

Strategies to mitigate cold starts include keeping deployment packages small (fewer dependencies means faster initialization), using lightweight runtimes (Python and Node.js start faster than Java), provisioned concurrency (AWS Lambda lets you pre-warm a specified number of instances that stay ready), and the "ping" pattern (invoking the function periodically to keep instances warm, though this is a workaround rather than a solution).

For most workloads (API backends, event processing, webhooks), cold start latency of 100 to 500ms is acceptable.

For latency-sensitive workloads (real-time APIs, interactive applications), cold starts can be a deal-breaker without provisioned concurrency.

When Serverless Works and When It Does Not

Serverless excels in specific scenarios and fails in others. Knowing the boundary saves you from forcing the wrong tool onto the wrong problem.

Serverless works well for: event-driven processing (file uploads trigger image resizing, database changes trigger notifications), APIs with variable traffic (an endpoint that gets 10 requests per hour at night and 10,000 per hour during the day), scheduled tasks (cron jobs that run for a few seconds or minutes), webhook receivers (processing incoming webhooks from third-party services), and prototyping (deploy a working API in minutes without infrastructure setup).
Serverless struggles with: long-running processes (Lambda has a 15-minute limit; batch jobs that run for hours need containers or VMs), consistent low-latency requirements (cold starts add unpredictable latency spikes), stateful applications (functions are ephemeral; state must live in external databases or caches), high-throughput steady-state workloads (if your function runs 24/7 at full capacity, a dedicated server is significantly cheaper), and complex orchestration (workflows involving many sequential function invocations accumulate latency and become difficult to debug).

Serverless Design Patterns

Single-function API: One Lambda function handles one API endpoint. Clean and simple, but the number of functions multiplies quickly. 50 endpoints mean 50 functions to deploy and manage.
Fat function (monolithic Lambda): One Lambda function handles all endpoints for a service, using an internal router (like Express.js) to dispatch requests. This reduces cold start frequency (any request warms the instance for all endpoints) but makes the deployment package larger.
Event-driven pipeline: Functions are chained through events. An S3 upload triggers a Lambda that resizes the image, which publishes an event that triggers another Lambda for content moderation, which triggers another for updating the database. Each function handles one step.
Scheduled processing: CloudWatch Events (or EventBridge) triggers a Lambda on a schedule. Daily report generation, hourly data aggregation, or periodic cleanup tasks.
Fan-out/fan-in: A coordinator function distributes work to many parallel worker functions (fan-out), then a final function collects and aggregates the results (fan-in). AWS Step Functions orchestrates this pattern with built-in error handling and retry logic.

Interview-Style Question

Q: Your team is deciding between running a notification service on EC2 instances with auto-scaling or as AWS Lambda functions. The service processes 50 to 5,000 notification events per minute depending on time of day. Which approach would you recommend?

A: Lambda is the better fit here. The traffic varies 100x between quiet and busy periods. With EC2, you would pay for instances during the quiet hours that are mostly idle, and you would need auto-scaling configuration to handle the spikes with a lag during scale-up. Lambda scales instantly to match the event rate, handles each notification independently, and costs nothing during quiet periods. Each notification event (fetch recipient, format message, call the delivery API) completes in under a few seconds, well within Lambda's limits. The only concern is if the notification service needs to maintain persistent connections (like a WebSocket to a push notification provider). In that case, a container running on ECS or Fargate with auto-scaling might be more appropriate because Lambda functions are short-lived and cannot maintain persistent connections across invocations.

KEY TAKEAWAYS

Serverless (FaaS) lets you run code without managing servers. The provider handles provisioning, scaling, and infrastructure. You pay only for execution time.
Cold starts add 100ms to 2 seconds of latency on the first invocation. Use lightweight runtimes and small deployment packages to minimize them. Use provisioned concurrency for latency-sensitive workloads.
Serverless excels for event-driven processing, variable-traffic APIs, and scheduled tasks. It struggles with long-running processes, steady-state high-throughput workloads, and stateful applications.
Common patterns include single-function APIs, fat functions, event-driven pipelines, and fan-out/fan-in with Step Functions.
If your workload runs at consistent high capacity 24/7, a dedicated server or container is almost always cheaper than serverless.

Other Architecture Patterns

Beyond the major patterns covered above, several other architectural approaches appear in system design discussions, interviews, and resources like Grokking the System Design Interview.

Understanding them broadens your vocabulary and helps you recognize the right pattern for unusual requirements.

Service-Oriented Architecture (SOA)

SOA is the predecessor of microservices. It organizes applications into services that communicate over a network, typically through an Enterprise Service Bus (ESB).

The ESB handles routing, message transformation, protocol translation, and orchestration between services.

The key difference between SOA and microservices is the ESB.

In SOA, significant business logic lives in the ESB, making it a complex, centralized component.

In microservices, the communication infrastructure is "dumb pipes" (message brokers, HTTP calls), and all business logic lives in the services themselves.

SOA was popular in large enterprises in the 2000s and early 2010s. Many organizations still run SOA-based systems.

If you encounter an ESB-heavy architecture in an interview or at a company, recognizing it as SOA helps you understand the system's structure and its migration path toward modern patterns.

Stateful vs. Stateless Architecture

A stateless service stores no data between requests. Every request carries all the information the server needs.

Stateless services scale effortlessly by adding instances behind a load balancer because any instance can handle any request.

A stateful service maintains data between requests.

A WebSocket server maintains an open connection per user. A gaming server tracks active game state in memory. A session server stores authentication tokens locally.

Stateful services are harder to scale because requests must be routed to the specific instance holding the relevant state (sticky sessions) or the state must be replicated across instances (adding complexity and latency).

The general principle (covered in Chapter III) is to make services stateless wherever possible by externalizing state to databases, caches, or object stores. Reserve stateful architecture for use cases where in-memory state provides a critical performance advantage, like real-time gaming or active WebSocket connection management.

Peer-to-Peer Architecture

In peer-to-peer (P2P) architecture, there is no central server. Every node is both a client and a server, providing and consuming resources simultaneously. Each node communicates directly with other nodes.

P2P is used in file sharing (BitTorrent, where each downloader also uploads chunks to other downloaders), blockchain networks (every node validates transactions and maintains a copy of the ledger), WebRTC (browser-to-browser video and audio communication without a central server), and distributed computing (SETI@Home, Folding@Home).

The advantage of P2P is resilience.

There is no single point of failure. The system gets stronger as more nodes join because each node contributes resources.

The disadvantage is coordination complexity.

Without a central authority, consensus is harder, discovery requires mechanisms like DHT (Distributed Hash Tables), and quality of service is unpredictable because it depends on volunteer participants.

P2P rarely appears as the primary architecture for commercial web applications. But its principles inform the design of CDNs (which distribute content across many nodes) and some distributed databases (like Cassandra, where every node is a peer).

Publish/Subscribe Architecture

Publish/subscribe (pub/sub) is a messaging pattern (covered in detail in Chapter II) that can serve as the backbone of an entire architecture.

In a pub/sub architecture, all inter-component communication flows through a message broker. Components publish events to topics and subscribe to topics they care about. No component communicates directly with another.

This creates extreme decoupling. Any component can be replaced, added, or removed without affecting others.

The broker is the central nervous system. Google Cloud Pub/Sub, Amazon SNS/SQS, and Apache Kafka can all serve as the backbone of a pub/sub architecture.

The trade-off is that every interaction involves the broker, which adds latency and creates a dependency on the broker's availability and performance. The broker becomes the one component that, if it goes down, takes down all communication.

Layered (N-Tier) Architecture

Layered architecture organizes an application into horizontal layers, each responsible for a specific concern. The most common form is three-tier architecture: presentation layer (UI), business logic layer (application), and data access layer (database).

Each layer communicates only with the layer directly below it.

The presentation layer calls the business logic layer. The business logic layer calls the data access layer. No layer skips a level.

Layered architecture provides clear separation of concerns and is the default structure for most web frameworks (Rails, Django, Spring MVC). It works well for CRUD applications and straightforward business logic.

The limitation is that strictly enforced layers can lead to unnecessary pass-through code where a layer adds no value but exists because the architecture demands it.

And the horizontal split does not align well with team structures: in a large organization, you want teams organized around business domains (orders, payments, users), not around technical layers (all UI developers, all backend developers, all database developers).

Hexagonal Architecture (Ports and Adapters)

Hexagonal architecture places the core business logic at the center, completely independent of external concerns like databases, APIs, user interfaces, and message queues.

The core exposes "ports" (interfaces) that define how external systems interact with it. "Adapters" implement those interfaces for specific technologies.

For example, the core business logic defines a port called OrderRepository with methods like save(order) and findById(id).

An adapter implements this port for PostgreSQL.

Another adapter implements it for DynamoDB.

The core does not know which database is being used. It only knows about the port interface.

This makes the business logic testable without any infrastructure. You can test order processing with an in-memory adapter instead of a real database. Swapping technologies (moving from PostgreSQL to DynamoDB) means writing a new adapter, not rewriting business logic.

Hexagonal architecture is particularly valuable in domain-heavy applications (financial systems, healthcare, logistics) where the business rules are complex and must be protected from infrastructure changes.

Cell-Based Architecture

Cell-based architecture is a relatively recent pattern used by companies operating at massive scale (AWS, Slack, and others).

The system is divided into independent cells, each running a complete copy of the application stack and serving a subset of users.

Each cell is a self-contained unit: its own load balancers, application servers, databases, caches, and queues. A routing layer directs each user to their assigned cell based on a partition key (like user ID or tenant ID). Cells do not communicate with each other during normal operation.

The critical benefit is blast radius reduction.

If a cell experiences a failure (bad deployment, hardware issue, software bug), only the users assigned to that cell are affected. The failure cannot propagate to other cells because they are completely isolated. If cell 3 out of 20 cells has a bad deployment, only 5% of users are impacted.

Cell-based architecture also simplifies scaling (add more cells) and testing (deploy changes to one cell first, observe, then roll out to others). It is the architecture pattern behind AWS's regional infrastructure and Slack's workspace distribution.

The trade-off is significant infrastructure overhead. Each cell is a complete stack, so you multiply your infrastructure by the number of cells.

Cross-cell operations (like searching across all users) require a separate aggregation layer.

Cell-based architecture makes sense at very large scale where blast radius control is worth the infrastructure cost.

Pattern	Key Idea	Strengths	Best For
SOA	Services + Enterprise Service Bus	Integration of heterogeneous enterprise systems	Legacy enterprise environments
Stateless	No state between requests	Effortless horizontal scaling	Web services, APIs
Peer-to-Peer	Every node is client and server	No single point of failure, resilient	File sharing, blockchain, WebRTC
Pub/Sub	All communication through message broker	Extreme decoupling	Event-driven systems, integrations
N-Tier	Horizontal layers (UI, logic, data)	Clear separation of concerns	CRUD applications, traditional web apps
Hexagonal	Core logic isolated from infrastructure	Testable, swappable infrastructure	Domain-heavy, complex business logic
Cell-Based	Independent cells serving user subsets	Blast radius isolation, safe deployments	Very large scale, multi-tenant platforms

Beginner Mistake to Avoid

New engineers sometimes pick an architecture pattern because it sounds impressive or because a famous company uses it.

Cell-based architecture works for AWS because they operate at a scale where blast radius control saves millions of dollars. For a startup with 1,000 users, it is absurd overhead.

Hexagonal architecture shines when business logic is genuinely complex and infrastructure changes are likely.

For a simple CRUD API, it adds layers of abstraction that slow development without proportional benefit. Always match the pattern to the problem, not to the company you admire.

Interview-Style Question

Q: You are designing a multi-tenant SaaS platform where each tenant (company) should be isolated from others so that one tenant's traffic spike or failure does not affect others. Which architecture pattern would you consider?

A: Cell-based architecture is the strongest fit for strict tenant isolation. Assign each tenant (or group of small tenants) to a cell. Each cell runs a complete, independent stack. A tenant's traffic spike is contained within their cell's resources. A bug affecting one cell does not propagate to others. Deployments can be rolled out cell-by-cell to limit blast radius. For smaller scale (tens of tenants), a simpler approach works: a single shared infrastructure with logical isolation (separate databases or schemas per tenant, resource quotas, and rate limiting). The cell-based approach becomes justified when the number of tenants is large enough that the isolation benefits outweigh the infrastructure multiplication cost.

KEY TAKEAWAYS

SOA is the precursor to microservices, distinguished by the centralized Enterprise Service Bus. Most modern systems prefer microservices with dumb pipes.
Make services stateless wherever possible. Reserve stateful architecture for use cases where in-memory state provides a critical performance advantage.
Peer-to-peer eliminates single points of failure but adds coordination complexity. It is most relevant for file sharing, blockchain, and real-time communication.
Pub/sub architecture decouples all components through a message broker. The broker becomes the critical dependency.
N-tier (layered) architecture provides clean separation of concerns and is the default for traditional web applications.
Hexagonal architecture protects business logic from infrastructure changes by isolating it behind ports and adapters. It is ideal for complex domains.
Cell-based architecture isolates failures to a subset of users by running independent, complete stacks per cell. It is justified only at very large scale.
Match the architecture to the problem. Impressive-sounding patterns are expensive if they do not solve a real constraint.

Up Next: You now have a comprehensive understanding of system properties (scalability, availability, consistency) and architecture patterns (monoliths, microservices, event-driven, serverless, and more). Chapter IV takes you into advanced topics that appear in senior-level system design interviews and production systems: search systems, data processing, unique ID generation, rate limiting, security, distributed system patterns, and performance optimization. These are the concepts that separate competent engineers from exceptional ones.