9.3 Appendix C: Real-World Architecture Case Studies - The System Design Interview Handbook

These summaries highlight the architectural patterns that power some of the world's largest systems. They connect the concepts from this handbook to real production systems.

How Netflix Handles 250+ Million Subscribers

Netflix's architecture is a masterclass in microservices at scale.

The streaming service runs hundreds of microservices on AWS, each independently deployed and scaled.

The content delivery network (Open Connect) consists of thousands of custom-built servers deployed in ISPs worldwide, caching popular content at the edge so that the majority of streaming traffic never crosses the broader internet.

Netflix pioneered chaos engineering with Chaos Monkey and the Simian Army, continuously testing resilience by injecting failures in production.

Their data pipeline processes billions of events per day for recommendation algorithms, A/B testing, and personalization.

Every aspect of the Netflix UI (artwork, row ordering, content selection) is personalized per user through ML models.

The platform uses a combination of Cassandra (for scalable data storage), EVCache (their customized Memcached layer), and Zuul (API gateway) for traffic routing.

How Twitter Serves 500+ Million Tweets Per Day

Twitter's timeline architecture evolved from a simple pull model (query all followed accounts on every feed load) to a sophisticated fan-out system.

The timeline service precomputes each user's home timeline by fanning out new tweets to followers' cached timelines (Redis).

For users with millions of followers (celebrities), fan-out is skipped, and their tweets are merged at read time to avoid the write amplification.

Twitter uses Manhattan (a distributed key-value store built in-house), FlockDB (a graph database for social relationships), and Snowflake (the ID generation system discussed in Chapter IV).

The search infrastructure uses Earlybird, a real-time inverted index that makes tweets searchable within seconds of posting.

How Uber Processes Millions of Rides in Real Time

Uber's architecture centers on geospatial processing and real-time matching.

The location service ingests millions of GPS updates per second from active drivers.

A geospatial index (using Google's S2 geometry library for cell-based spatial indexing) enables sub-second queries for nearby available drivers.

The matching algorithm considers distance, estimated time of arrival, driver rating, and trip type.

Uber's pricing service computes dynamic surge pricing based on real-time supply and demand ratios per geographic cell.

The platform uses Kafka for event streaming, Cassandra for trip data, MySQL with Vitess (a sharding middleware) for relational data, and a custom-built distributed database called Docstore.

How Slack Built a Messaging Platform at Scale

Slack routes all real-time messages through WebSocket connections to channel servers. Each channel server manages a set of channels and their connected members.

When a message is posted, the channel server broadcasts it to all connected members of that channel.

Slack uses a cell-based architecture for isolation.

Each workspace (company) is assigned to a cell, and cells operate independently.

A failure in one cell does not affect other cells.

The search infrastructure indexes every message in Elasticsearch, partitioned by workspace.

Slack migrated from a monolithic PHP application to a service-oriented architecture as they grew, using the strangler fig pattern.

How Stripe Processes Billions in Payments

Stripe's architecture is built around reliability and idempotency.

Every API request includes an idempotency key so that retries never create duplicate charges.

The payment flow uses a saga-like pattern: authorize the charge, capture the funds, and settle with the bank, with compensating transactions for each step.

Stripe processes sensitive payment data within a PCI-compliant environment, isolating card handling in a separate, heavily audited subsystem.

The API serves millions of requests per day with sub-200ms latency. Stripe uses Ruby (historically), Go (for performance-critical services), and a custom-built distributed database for financial ledger operations that guarantees double-entry accounting consistency.

How Google Search Indexes the Entire Web

Google's search infrastructure crawls hundreds of billions of web pages, builds an inverted index distributed across thousands of machines, and serves queries in under 500ms.

The crawling infrastructure (Googlebot) continuously discovers and re-crawls pages, prioritizing frequently updated sites.

The index is sharded by document and replicated for throughput and fault tolerance.

Each query is scattered to all shards in parallel, and results are merged by a coordinator.

Ranking uses hundreds of signals including PageRank, content relevance (a BERT-based neural model), page quality, freshness, user engagement, and mobile-friendliness.

The serving infrastructure uses custom hardware and software optimized for the specific access patterns of search.

How Discord Handles Millions of Concurrent Voice/Text Users

Discord supports real-time text messaging and voice communication for millions of concurrent users.

Text messages are stored in Cassandra (later migrated to ScyllaDB for better tail latency). Each message is partitioned by channel and sorted by timestamp.

Voice communication uses a selective forwarding unit (SFU) architecture rather than peer-to-peer: audio streams from each participant are sent to a central server, which selectively forwards the relevant streams to other participants.

This is more efficient than full mesh peer-to-peer for groups larger than a few people.

Discord uses Elixir (built on the Erlang VM) for real-time services because of Erlang's concurrency model, which handles millions of lightweight processes efficiently.

For more case studies, check out Grokking the System Design Interview course.