1.2 Thinking Like a System Designer - The System Design Interview Handbook

The System Design Mindset

The single most valuable shift you can make as an engineer is realizing that there are no perfect solutions in system design.

There are only trade-offs.

Every decision you make gives you something and takes something away.

Pick a relational database for its strong consistency guarantees, and you pay with slower write speeds at massive scale.

Choose a NoSQL document store for flexibility, and you lose the ability to run complex joins across your data.

Go with microservices for independent deployability, and you inherit the headache of distributed debugging and network failures between services.

Beginners often search for the "best" database, the "best" architecture, the "best" caching strategy.

Experienced engineers ask a different question: what am I willing to give up?

That question changes everything. It moves you from chasing a mythical perfect answer to evaluating real options against real constraints.

When you sit in a system design interview, the interviewer is not checking whether you picked the "right" answer. They are watching whether you can articulate the trade-offs behind your choices.

"I chose Cassandra here because we need high write throughput and can tolerate eventual consistency" is a sentence that gets people hired.

"I chose Cassandra because it is a good database" is a sentence that does not.

Requirements-Driven Design vs. Technology-Driven Design

There are two ways engineers approach a new system.

The first starts with the problem: what does this system need to do, for how many users, under what constraints?

The second starts with the technology: I want to use Kafka, Kubernetes, and Redis, now let me figure out where they fit.

Requirements-driven design works. Technology-driven design almost always leads to over-engineering.

When you start with requirements, you discover that maybe you do not need Kafka at all.

Maybe a simple task queue handles your volume just fine.

Maybe you do not need a distributed cache because your database is fast enough with proper indexing.

Requirements keep you grounded. Technology preferences push you toward complexity you have not earned.

This does not mean you should ignore technology choices. It means you should pick technologies because they solve a specific requirement, not because they look impressive on an architecture diagram.

Designing for the Future: Anticipating Growth and Change

A system that handles today's traffic but collapses under next year's growth is a system with an expiration date.

Good system design accounts for where the product is going, not just where it is now.

But there is a trap here.

Some engineers try to build for a billion users on day one. They architect for a scale they may never reach and burn months of engineering time on infrastructure that sits idle.

The skill is in finding the middle ground: design your system so it can grow without a full rewrite, but do not build the growth infrastructure until you actually need it.

A practical way to think about this is to design for 10x your current scale.

If you have 10,000 users, make sure your architecture can handle 100,000 without a fundamental redesign. When you actually approach 100,000, plan for a million.

This gives you breathing room without wasting resources on hypothetical problems.

Iterative Design: Starting Simple and Evolving

No system is designed perfectly in one pass.

Every production system you admire went through dozens of iterations.

Version one was probably ugly. Version two fixed the worst problems.

Version ten is the one people write blog posts about.

Start with the simplest design that meets your current requirements.

A single server, a single database, a straightforward request-response flow. Then identify the first real bottleneck.

Maybe your database reads are too slow, so you add a cache.

Maybe your single server cannot handle the traffic, so you put a load balancer in front of two servers. Each addition solves a specific, observed problem.

This approach has a name in engineering: evolutionary architecture.

You do not predict every future need.

You build a system that is easy to change when those needs arrive.

Loose coupling between components, clear API boundaries, and stateless services all make iteration cheaper.

If changing one part of your system forces you to rewrite three other parts, your architecture is too rigid.

The best system designers are not the ones who get it right on the first try. They are the ones who make it easy to get it right on the second, third, and tenth try.

Interview-Style Question

Q: How do you decide between consistency and availability when designing a distributed system?

A: It depends on what the system does. A banking application needs strong consistency because showing a wrong account balance is unacceptable. A social media feed can tolerate eventual consistency because seeing a post two seconds late is a minor inconvenience, not a crisis. The decision comes from the requirements, not from a personal preference for one over the other. In the interview, explaining why you chose one trade-off over another matters more than which one you chose.

Tradeoffs

KEY TAKEAWAYS

There are no perfect solutions in system design, only trade-offs. Your job is to pick the trade-off that hurts least for your specific situation.
Start with requirements, not technologies. Let the problem guide your architecture, not the other way around.
Design for 10x your current scale, not 1000x. Build growth headroom without over-engineering for hypothetical traffic.
Iterate relentlessly. Start simple, observe real bottlenecks, and evolve your design one problem at a time.

Functional vs. Non-Functional Requirements

Every system design starts with a question that sounds simple but is deceptively hard to answer well: what does this system actually need to do?

That question splits into two categories.

Functional requirements describe what the system does.

Non-functional requirements describe how well the system does it. Miss either category, and your design will have serious blind spots.

Defining Functional Requirements: Core Features and Use Cases

Functional requirements are the features your users interact with directly.

If you are designing a messaging application, the functional requirements might include: users can send text messages to other users, users can create group conversations, users can see when their message has been delivered and read, users can share images and files.

These are concrete, testable capabilities.

You can look at each one and say "yes, the system does this" or "no, it does not."

There is no gray area.

The mistake beginners make is listing too many functional requirements at once.

In an interview, you have 35 to 45 minutes. You cannot design a system that handles messaging, voice calls, video calls, stories, payments, and a marketplace all at once. Scope is everything.

Pick the three or four most critical features and design those well. You can always mention additional features and explain how you would extend the design later.

When gathering functional requirements, ask yourself: if this feature did not exist, would the product still make sense? If the answer is no, it is a core requirement. If the answer is yes, it can wait.

Defining Non-Functional Requirements: Performance, Scalability, Reliability, Security

Non-functional requirements are the invisible expectations.

Your users will never say "I want the system to have 99.99% availability."

But they will absolutely notice when the app is down during a big product launch.

The most common non-functional requirements you will deal with are performance (how fast should responses be?), scalability (how many users or requests should the system support?), reliability (how often can the system fail before it becomes a problem?), and security (what data needs protection and from whom?).

Each of these needs a number attached to it.

"The system should be fast" is not a requirement.

"The system should return search results in under 200 milliseconds for 95% of requests" is a requirement. Numbers turn vague wishes into engineering targets you can actually design for.

Other non-functional requirements include data durability (can we afford to lose any data?), consistency (do all users need to see the same data at the same time?), and maintainability (can a new engineer understand and modify this system without wanting to quit?).

Prioritizing and Resolving Conflicting Requirements

Here is where things get tricky. Non-functional requirements often fight each other.

You want strong consistency and high availability?

The CAP theorem says you cannot have both during a network partition. You want ultra-low latency and complete data durability?

Writing to disk is slower than writing to memory. You want maximum security and frictionless user experience?

Every security layer adds friction.

Your job as a system designer is to decide which requirements win when they conflict. And that decision always comes from the use case.

A stock trading platform prioritizes consistency and low latency over everything else because a stale price can cost millions.

A social media timeline prioritizes availability and scalability because showing a slightly outdated feed is better than showing an error page.

Document these priority decisions explicitly. In an interview, say them out loud.

"For this system, I am prioritizing availability over strong consistency because a user seeing a post one second late is acceptable, but a user seeing an error page is not." That kind of clarity signals mature engineering thinking.

Translating Business Goals into Technical Requirements

Business stakeholders rarely speak in technical terms. They say things like "we need to handle Black Friday traffic" or "our users in Asia are complaining about slow load times" or "we cannot afford any data loss."

Your job is to translate those statements into numbers.

"Handle Black Friday traffic" becomes "support 50x normal QPS for a 6-hour window."

"Slow load times in Asia" becomes "deploy to an Asia-Pacific region and serve static content from a CDN with edge nodes in Tokyo, Singapore, and Mumbai."

"No data loss" becomes "replicate all writes to at least three nodes across two availability zones before acknowledging success."

This translation skill is one of the most valuable things you can develop.

Engineers who can sit in a room with business people, listen to their goals, and walk out with a clear set of technical requirements are the ones who end up leading projects.

Interview-Style Question

Q: You are designing a file-sharing service. The product team wants instant uploads, zero data loss, and global availability. How do you prioritize these requirements?

A: You cannot optimize for all three equally, so you rank them. Zero data loss is non-negotiable for a file-sharing service, so data durability is the top priority. Global availability comes second because users in different regions need reliable access. Instant uploads comes third because you can use techniques like chunked uploads and progress indicators to make uploads feel fast even if the actual transfer takes time. You would acknowledge writes only after replication confirms durability, serve reads from the nearest CDN edge, and stream upload progress to the client so the perceived speed stays high.

Functional Requirements

KEY TAKEAWAYS

Functional requirements define what the system does. Non-functional requirements define how well it does it. You need both.
Scope aggressively. In interviews and in real projects, fewer well-designed features beat a sprawling half-baked system.
Attach numbers to every non-functional requirement. "Fast" is not a requirement. "Under 200ms at p95" is.
When requirements conflict, let the use case decide which one wins. Document and communicate that priority clearly.
Learn to translate business language into technical specs. That skill will define your career more than any framework or tool.

Back-of-the-Envelope Estimation

One of the fastest ways to separate an experienced system designer from a beginner is to ask them to estimate.

How much storage will this system need in a year?

How many requests per second should the database handle?

How much bandwidth will the video streaming feature consume?

You do not need exact answers.

You need estimates that are close enough to guide your design decisions.

Being off by 2x is fine.

Being off by 100x means you will pick the wrong architecture entirely.

Powers of Two Table and Common Data Size References

Every estimation starts with knowing your units. Memorize these, or at least keep them where you can glance at them quickly.

Power	Exact Value	Approximate	Unit
2^10	1,024	1 Thousand	1 KB
2^20	1,048,576	1 Million	1 MB
2^30	1,073,741,824	1 Billion	1 GB
2^40	1,099,511,627,776	1 Trillion	1 TB
2^50	—	1 Quadrillion	1 PB

Some useful reference sizes: a single English character is 1 byte, a typical tweet-length text message is about 250 bytes, a typical JSON API response is 1 to 10 KB, a high-resolution photo is 2 to 5 MB, a minute of HD video is roughly 100 to 150 MB.

These reference points let you anchor your estimates.

If someone tells you the system stores 500 million photos, you can immediately estimate: 500 million times 3 MB average equals 1.5 petabytes.

That single number tells you that you are in "distributed object storage" territory, not "single database server" territory.

Latency Numbers Every Programmer Should Know

Not all operations are created equal.

The time it takes to read from memory versus reading from disk versus making a network call varies by orders of magnitude.

Having a rough sense of these numbers helps you spot bottlenecks before you build them.

Operation	Approximate Latency
L1 cache reference	0.5 nanoseconds
L2 cache reference	7 nanoseconds
RAM reference	100 nanoseconds
SSD random read	150 microseconds
HDD random read	10 milliseconds
Network round trip (same datacenter)	0.5 milliseconds
Network round trip (cross-continent)	150 milliseconds

The key insight from this table: reading from memory is roughly 100,000 times faster than reading from a spinning hard drive.

A network call within the same data center is about 1,000 times slower than reading from RAM. These gaps are why caching exists.

They are why CDNs exist.

They are why every major system puts frequently accessed data as close to the user as physically possible.

Estimating QPS (Queries Per Second) and Throughput

QPS estimation usually starts with the number of users and works backward.

Here is a general approach.

Say you are designing a system with 10 million daily active users.

Each user performs an average of 20 actions per day. That gives you 200 million requests per day.

Divide by 86,400 (the number of seconds in a day) and you get roughly 2,300 requests per second on average.

But average QPS is not the whole story.

Traffic is never evenly distributed.

Peak hours might see 3 to 5 times the average. So your system needs to handle somewhere between 7,000 and 12,000 requests per second at peak. That is your design target.

For read-heavy systems like social media feeds, you might see a 10:1 read-to-write ratio.

For write-heavy systems like logging platforms, the ratio flips. Knowing this ratio determines whether you optimize your database for reads (add replicas, add caches) or writes (use append-only storage, use write-optimized engines).

Estimating Storage, Bandwidth, and Memory Requirements

Storage estimation follows a simple formula: number of items multiplied by average item size multiplied by the retention period.

If your messaging app stores 1 billion messages per day, each message averages 200 bytes, and you keep messages for 5 years, your total storage is: 1 billion times 200 bytes times 365 days times 5 years, which equals about 365 terabytes.

That tells you immediately that you need a distributed storage solution with sharding.

Bandwidth estimation works similarly.

If your system serves 100,000 requests per second and each response is 5 KB, your outbound bandwidth is 500 MB per second, or about 4 gigabits per second. That number matters when you are choosing your network infrastructure and CDN strategy.

Memory estimation is most relevant for caching.

If you want to cache the hottest 20% of your data, and your total dataset is 1 TB, you need about 200 GB of cache memory.

That might mean a cluster of cache nodes rather than a single machine.

Capacity Planning and Resource Estimation Techniques

Capacity planning ties all these estimates together into a resource plan.

How many servers, how much storage, how much cache, how much bandwidth?

A practical approach: start with your peak QPS.

If a single application server handles 500 requests per second (a reasonable estimate for a typical web server), and your peak QPS is 10,000, you need at least 20 application servers.

Add a few extra for redundancy and you might plan for 25 to 30.

Do the same calculation for your database, your cache, and your storage.

Layer in replication factors (usually 3x for critical data).

Account for growth over the next year or two.

The result is a concrete infrastructure plan that you can present in an interview or hand to an operations team.

Do not worry about being precise.

The goal of estimation is to land in the right order of magnitude so your architecture decisions hold up.

If your estimate says you need 50 servers and the real number turns out to be 70, your design is still valid.

If your estimate says you need 5 servers and the real number is 500, your entire architecture is wrong.

Interview-Style Question

Q: Estimate the storage needed for a service that stores 100 million user profiles, where each profile contains a name, email, bio, and profile picture.

A: Text data per profile (name, email, bio) is roughly 1 KB. A compressed profile picture averages about 200 KB. Total per profile is about 201 KB. For 100 million profiles: 100 million times 201 KB equals roughly 20 TB. With a replication factor of 3 for durability, you need about 60 TB of raw storage. That is well within what a distributed object store like S3 handles comfortably, with the text metadata stored separately in a database.

KEY TAKEAWAYS

Memorize the powers of two and common data sizes. They are the foundation of every estimate you will ever make.
Know the latency hierarchy: memory is fast, disk is slow, network is slower, cross-continent is slowest. This hierarchy drives caching and CDN decisions.
Estimate QPS by starting with daily active users and actions per user, then multiply by a peak factor of 3x to 5x.
Storage, bandwidth, and memory all follow the same pattern: count multiplied by size multiplied by time. Simple arithmetic, powerful results.
Aim for the right order of magnitude, not precision. Close enough guides good architecture. Way off leads to disaster.

The System Design Life Cycle

Designing a system is not a single event. It is a process that loops through several phases, and each phase feeds information back into the others.

Understanding this cycle helps you know where you are at any point in a project and what kind of thinking each moment requires.

Requirements Gathering and Scoping

Every system starts with questions.

What problem are we solving?

Who are the users?

How many of them will there be in six months?

In two years?

What operations are most frequent?

What data do we need to store and for how long?

What happens if the system goes down for five minutes?

In a real engineering project, you gather these answers from product managers, business stakeholders, and existing users.

In a system design interview, you gather them by asking the interviewer. Either way, this phase is not optional.

Skipping it is the most common reason designs fail, both in interviews and in production.

Scoping is the discipline of drawing a boundary around what you will and will not build.

A messaging system does not need to handle video calls in version one.

An e-commerce platform does not need a recommendation engine on launch day.

Scope creep kills projects.

The best engineers are the ones who can say "that is a great feature, and we will add it in phase two" without feeling like they are cutting corners.

A useful framework: list every feature the system could have, then split that list into three buckets.

Must-have features go in the first bucket, those are your core functional requirements.

Should-have features go in the second bucket, those get designed but deprioritized.

Nice-to-have features go in the third bucket, those get mentioned but not designed in detail.

High-Level Design (HLD) vs. Low-Level Design (LLD)

Once you know your requirements, the design process splits into two altitudes.

High-level design is the bird's-eye view.

You identify the major components of the system, how they connect, and how data flows between them. At this altitude, you are drawing boxes and arrows: clients, load balancers, application servers, databases, caches, message queues. You are deciding which components exist and how they talk to each other. You are not worrying about database schema or API parameter names yet.

Low-level design zooms in on individual components.

How does the database schema look?

What are the exact API endpoints and their request/response formats?

What algorithm does the cache use for eviction?

How does the notification service decide which delivery channel to use?

LLD is where the abstract architecture becomes concrete, buildable code.

In an interview, you typically spend 60 to 70 percent of your time on HLD and 30 to 40 percent on LLD, usually diving deep into one or two components that the interviewer finds most relevant.

In a real project, HLD happens first during the planning phase, and LLD happens iteratively as teams pick up individual components to build.

The relationship between HLD and LLD is not strictly sequential.

As you dig into low-level details, you sometimes discover that your high-level design has a flaw.

Maybe the data model does not support a key query pattern you need.

Maybe the communication pattern between two services creates a circular dependency. Good designers move fluidly between the two altitudes, refining both as understanding deepens.

Prototyping, Implementation, and Iteration

A design document is a hypothesis. It predicts how the system will behave under certain conditions. Prototyping and implementation test that hypothesis against reality.

Prototyping does not mean building the entire system. It means building the riskiest part first.

If you are unsure whether your database schema can handle the query patterns you need at scale, prototype that.

If you are not sure whether WebSockets will hold up under 100,000 concurrent connections, prototype that. Target your uncertainty.

Implementation follows the same iterative philosophy from section 2.1.

Build the simplest working version.

Measure its behavior under realistic conditions.

Identify the first bottleneck.

Fix it. Repeat.

Teams that try to build the entire production-ready system in one pass almost always end up reworking large portions of it because assumptions made in month one turned out to be wrong by month three.

One pattern that works well in practice: define a "walking skeleton," a minimal end-to-end flow that touches every major component but only implements the simplest possible version of each.

For a messaging app, that might be: client sends a text message, server receives it, stores it in a database, delivers it to the recipient.

No group chats, no file attachments, no read receipts.

Just the core flow.

Once that skeleton works, you add features and scale incrementally.

Monitoring, Evaluation, and Continuous Improvement

A system does not stop needing design attention after it ships.

In many ways, the most valuable design decisions happen after launch, when you finally have real data about how the system behaves.

Monitoring gives you that data.

You track metrics like request latency at various percentiles (p50, p95, p99), error rates, CPU and memory utilization, database query performance, cache hit ratios, and queue depths. Each metric tells you something about the health of a specific component.

A rising p99 latency might reveal a slow database query.

A dropping cache hit ratio might mean your eviction policy needs tuning. A growing queue depth might signal that your consumers cannot keep up with producers.

Evaluation means comparing your system's actual performance against the non-functional requirements you defined in the design phase.

Did you promise 99.9% availability?

Check your uptime metrics.

Did you target sub-200ms response times?

Look at your latency percentiles.

If reality does not match the promise, you have a design problem to solve.

Continuous improvement is the recognition that a shipped system is not a finished system.

Traffic patterns change.

User behavior evolves.

New features create new bottlenecks.

The database that handled your load comfortably in year one might struggle in year two.

Teams that build monitoring into their systems from day one catch these problems early, when they are cheap to fix.

Teams that skip monitoring discover them during outages, when they are expensive and stressful.

The design life cycle is a loop, not a line.

Requirements inform design.

Design informs implementation.

Implementation reveals new requirements.

Monitoring surfaces new design challenges.

The best systems are the ones whose teams treat this loop as a feature, not a burden.

Interview-Style Question

Q: You have just shipped a new service and the p99 latency is three times higher than expected. How do you approach this?

A: Start by identifying where the latency is coming from. Add instrumented tracing across the request path to see which component contributes the most time. Common culprits are unindexed database queries, missing caches, or network calls to external services. Once you find the bottleneck, evaluate your options: add an index, introduce a cache layer, batch network calls, or move data closer to the application. Fix the biggest contributor first, measure again, and repeat. Do not guess. Measure first, then act.

System Design Life Cycle

KEY TAKEAWAYS

Always start with requirements. Skipping this step is the single most common reason system designs fail in interviews and in production.
Scope ruthlessly. Split features into must-have, should-have, and nice-to-have. Design the must-haves well instead of designing everything poorly.
High-level design defines the components and their connections. Low-level design defines the internals of each component. Move fluidly between both.
Prototype the riskiest parts of your design first. Build a walking skeleton, then iterate.
Monitoring is not an afterthought. Build it from day one so you catch problems when they are cheap to fix, not during a 3 AM outage.