7.5 Blockchain & Decentralized Systems - The System Design Interview Handbook

Consensus Mechanisms: PoW, PoS, BFT

Every distributed system needs a way for nodes to agree on the state of the data.

In traditional systems, consensus algorithms like Raft and Paxos operate under the assumption that nodes may fail but do not actively lie or cheat.

Blockchain operates in a harsher environment: nodes are untrusted, anonymous, and potentially malicious. The consensus mechanism must produce agreement even when some participants are actively trying to deceive the system.

This is the Byzantine Generals Problem.

Multiple generals need to agree on a battle plan, but some of them are traitors sending false messages.

How do the loyal generals reach agreement despite the traitors?

Blockchain consensus mechanisms are different answers to this question.

Proof of Work (PoW)

Proof of Work requires participants (called miners) to solve a computationally difficult puzzle before they can add a new block to the chain.

The puzzle involves finding a number (a nonce) that, when combined with the block's data and hashed, produces a hash below a target threshold.

Finding this nonce requires brute-force computation: trying billions of random values until one works.

The first miner to solve the puzzle broadcasts the block to the network.

Other nodes verify the solution (verification is trivial, just one hash computation) and accept the block.

The winning miner receives a reward (newly minted cryptocurrency plus transaction fees).

PoW's security comes from its economic cost.

Attacking the network (rewriting history, double-spending) requires controlling more than 50% of the total computational power, which is prohibitively expensive for established networks.

Bitcoin's PoW network consumes more electricity than many countries, making a 51% attack astronomically expensive.

The trade-off is energy consumption and speed.

PoW is deliberately slow (Bitcoin produces one block roughly every 10 minutes) and consumes enormous amounts of electricity because the computational work has no purpose beyond securing the network.

The work itself is pure waste, by design.

Proof of Stake (PoS)

Proof of Stake replaces computational work with economic stake.

Instead of competing to solve puzzles, participants (called validators) lock up cryptocurrency as collateral.

The protocol selects a validator to propose the next block, typically with probability proportional to their stake (more stake means higher chance of being selected).

Other validators attest to the block's validity.

If a validator proposes a fraudulent block or behaves maliciously, their staked funds are partially or fully destroyed (slashed). This economic penalty replaces PoW's energy cost as the security mechanism.

Attacking the network requires controlling a majority of the staked value, which for a large network is billions of dollars, and the attacker's own stake would be slashed in the process.

PoS is dramatically more energy-efficient than PoW (by 99.9%+).

Ethereum transitioned from PoW to PoS in September 2022 ("The Merge"), reducing its energy consumption by approximately 99.95%.

PoS also enables faster block times (Ethereum produces blocks every 12 seconds) and higher transaction throughput.

The trade-off is that PoS can centralize toward wealthy validators (those with more stake earn more rewards and can afford to stake even more), creating a "rich get richer" dynamic.

Various PoS implementations address this through delegation (smaller holders delegate their stake to validators and share rewards), random validator selection with anti-concentration measures, and minimum stake requirements low enough for broader participation.

Byzantine Fault Tolerance (BFT)

BFT consensus algorithms (like PBFT, Practical Byzantine Fault Tolerance) take a different approach entirely. Instead of open participation by anonymous nodes (like PoW and PoS), BFT systems operate with a known, fixed set of validators.

The validators exchange messages in rounds to agree on each block.

BFT tolerates up to one-third of validators being malicious (the standard Byzantine fault tolerance threshold).

BFT produces fast finality: once a block is agreed upon, it is final.

There is no chance of chain reorganization (which can happen in PoW and PoS where finality is probabilistic).

This makes BFT suitable for permissioned blockchains (where participants are known and vetted) and enterprise use cases where speed and finality matter more than open participation.

BFT does not scale well to large numbers of validators because the message complexity grows quadratically (each validator communicates with every other validator in each round).

Most BFT-based systems limit the validator set to tens or low hundreds of nodes.

This is fine for permissioned networks but unsuitable for open, permissionless blockchains with thousands of participants.

Tendermint (used by Cosmos) and HotStuff (the basis for Meta's former Libra/Diem project) are modern BFT-derived consensus protocols that improve on classic PBFT with better performance characteristics.

Mechanism	Security Model	Energy	Speed	Finality	Decentralization
PoW	Computational cost	Extremely high	Slow (minutes per block)	Probabilistic	High (anyone can mine)
PoS	Economic stake	Very low	Moderate (seconds per block)	Probabilistic (fast with checkpoints)	Moderate (stake-weighted)
BFT	Known validators, message rounds	Very low	Fast (sub-second possible)	Immediate (deterministic)	Low (fixed validator set)

Interview-Style Question

Q: Your company is building a supply chain tracking system where 20 known partners need to share data with each other. No anonymous participation is needed. Which consensus mechanism would you recommend?

A: BFT (specifically a Tendermint or HotStuff variant). With only 20 known participants, BFT's validator set limitation is not a constraint. BFT provides immediate finality (supply chain records should not be subject to reorganization), fast consensus (partners need to see updates in seconds, not minutes), and energy efficiency (no wasted computation). PoW is overkill and wasteful for a permissioned network. PoS adds unnecessary complexity when participants are already known and trusted. A permissioned blockchain using BFT consensus, or potentially a simpler shared database with cryptographic audit trails, would serve this use case well.

Distributed Ledger Architecture

A blockchain is a specific type of distributed ledger: a shared, append-only record of transactions maintained across multiple nodes without a central authority.

Understanding its architecture helps you recognize both its strengths and its inherent limitations.

Blocks and Chains

Data is organized into blocks.

Each block contains a set of transactions, a timestamp, a reference (hash) to the previous block, and the result of the consensus process (the proof of work, the validator signatures, etc.).

The reference to the previous block creates a chain: Block 5 points to Block 4, which points to Block 3, all the way back to the genesis block (Block 0).

This chain structure makes the ledger tamper-evident.

If someone modifies a transaction in Block 3, the hash of Block 3 changes.

Block 4's reference to Block 3's hash no longer matches. Block 5's reference to Block 4 no longer matches.

The entire chain from the modification point forward is invalidated.

Tampering with historical data requires re-computing the entire chain from the modified block onward, which in PoW means re-doing all the computational work, and in PoS means convincing a majority of validators to accept the altered chain.

For established networks, this is effectively impossible.

Nodes and Network

Every full node in the network stores a complete copy of the ledger.

When a new block is produced, it is broadcast to all nodes, each of which independently verifies the block (checking that transactions are valid, that the consensus rules were followed) and appends it to their local copy.

This full replication is what eliminates the need for a central authority.

There is no single server that holds the "real" copy of the data. Every node has an equal copy, and disagreements are resolved by the consensus mechanism.

If a node goes offline, the network continues without it. When it comes back, it syncs the blocks it missed.

The cost of full replication is storage and bandwidth.

Every node stores every transaction ever recorded. Bitcoin's blockchain is over 500 GB. Ethereum's full state is even larger. This limits the transaction throughput because every transaction must be stored by every node.

Scaling solutions (sharding, layer-2 networks, rollups) address this by reducing the data each node must process and store.

Merkle Trees

Transactions within a block are organized into a Merkle tree (a binary hash tree).

Each leaf node is the hash of a transaction.

Each parent node is the hash of its two children.

The root of the tree (the Merkle root) is stored in the block header.

Merkle trees enable efficient verification.

To prove that a specific transaction is included in a block, you only need the Merkle root and a path of hashes from the transaction to the root (called a Merkle proof). You do not need to download the entire block.

This enables lightweight clients (like mobile wallets) that verify transactions without storing the full blockchain.

Permissioned vs. Permissionless

Permissionless blockchains (Bitcoin, Ethereum) let anyone join the network, run a node, and participate in consensus. They are fully decentralized and censorship-resistant. The trade-off is lower throughput and higher latency because consensus must work across thousands of anonymous, potentially adversarial participants.
Permissioned blockchains (Hyperledger Fabric, R3 Corda) restrict participation to known, authorized entities. They provide higher throughput and lower latency (because BFT consensus is faster with fewer, known validators) but sacrifice decentralization. Permissioned blockchains are used in enterprise settings where the participants are known (banks in a payment network, companies in a supply chain) and the goal is a shared ledger with cryptographic integrity, not full decentralization.

Type	Participation	Consensus	Throughput	Decentralization	Use Cases
Permissionless	Anyone	PoW, PoS	Lower (10-1000 TPS)	High	Cryptocurrency, DeFi, public records
Permissioned	Authorized entities	BFT	Higher (1000-10000+ TPS)	Lower	Supply chain, banking, healthcare records

Smart Contracts and Decentralized Applications (DApps)

A smart contract is a program stored on the blockchain that executes automatically when predetermined conditions are met. It is not a legal contract. It is code that runs in a deterministic, tamper-proof environment where the execution and results are verified by every node in the network.

How Smart Contracts Work

A developer writes a smart contract (typically in Solidity for Ethereum) and deploys it to the blockchain.

The contract's code and state are stored on-chain.

Users interact with the contract by sending transactions that call its functions.

Each function call is a transaction that gets included in a block, executed by every node, and the resulting state change is recorded permanently.

A simple escrow contract might work like this: a buyer sends funds to the contract, the seller delivers the goods, and when both parties confirm (or a timeout expires), the contract releases the funds to the seller or returns them to the buyer. No intermediary (bank, escrow service) is needed.

The contract enforces the rules automatically.

Smart contracts are deterministic: given the same input, every node produces the same output. They are immutable once deployed: the code cannot be changed (though patterns exist for upgrading contracts through proxy contracts).

And they are transparent: anyone can read the contract's code and verify what it does.

Gas and Execution Costs

Running a smart contract consumes computational resources on every node in the network.

To prevent abuse (infinite loops, spam), Ethereum charges a fee called gas for every operation the contract performs.

Simple operations (adding two numbers) cost little gas. Complex operations (writing to storage, creating new contracts) cost more. The user pays gas fees to compensate the validators for the computation.

Gas creates a natural incentive to write efficient smart contracts.

A poorly optimized contract that wastes computation costs its users more in fees.

In system design, gas costs are a material concern: a DApp that requires multiple expensive contract interactions per user action may be economically unviable if gas prices are high.

Decentralized Applications (DApps)

A DApp is an application built on top of a blockchain, using smart contracts for its backend logic and typically a traditional web frontend for the user interface.

The frontend communicates with the blockchain through a provider (like MetaMask, which manages the user's wallet and signs transactions).

DApp architecture typically combines an on-chain component (smart contracts that handle trust-critical logic: asset ownership, transfers, governance votes), an off-chain component (a traditional backend that handles operations that do not need blockchain's guarantees: user profiles, caching, search, notifications), and a frontend (a web or mobile application that connects to both the on-chain and off-chain components).

Not everything belongs on-chain.

Storing large files on Ethereum is prohibitively expensive.

Decentralized storage systems like IPFS (InterPlanetary File System) and Arweave store the data off-chain, and the blockchain stores only a hash (a fingerprint) of the data, proving that a specific file existed at a specific time without storing the file itself.

When Blockchain Is and Is Not the Right Solution

This is the most practical section of this chapter.

Blockchain is a powerful technology, but it is applied far more often than it should be.

In system design interviews and in real engineering decisions, recognizing when blockchain adds value versus when it adds unnecessary complexity is a sign of mature thinking.

When Blockchain Makes Sense

Blockchain earns its place when several conditions are true simultaneously.

Multiple parties who do not fully trust each other need to share data: If all parties trust a single central authority, that authority can maintain the database. Blockchain is for situations where no single party should have unilateral control over the data.

The data needs to be tamper-evident: Participants need assurance that historical records have not been altered. A supply chain tracking system where manufacturers, shippers, and retailers need to verify that records have not been changed. A voting system where election results must be verifiable and immutable.

Intermediary elimination provides real value: Blockchain removes the need for a trusted middleman. In cross-border payments, multiple banks act as intermediaries, each adding fees and delays. A blockchain-based payment network can settle transactions directly between parties. But if there is already a trusted intermediary that works well and is cost-effective, adding blockchain adds complexity without proportional benefit.

Censorship resistance is required: If the system must operate even when powerful entities (governments, corporations) try to shut it down or control it, a permissionless blockchain provides resilience that centralized systems cannot match.

Real Blockchain Use Cases

Cryptocurrency and digital assets: Bitcoin for decentralized money. Ethereum for programmable digital assets. Stablecoins (USDC, USDT) for blockchain-based dollars. This is the original and most validated blockchain use case.
Decentralized finance (DeFi): Lending, borrowing, and trading without traditional financial intermediaries. Smart contracts enforce the rules that banks and brokers traditionally manage.
Supply chain provenance: Tracking goods from manufacturer to consumer across multiple organizations. Each participant records their step in the supply chain on a shared ledger that no single party controls.
Digital identity and credentials: Self-sovereign identity systems where users control their own identity data rather than relying on centralized identity providers.
Cross-border payments and remittances: Reducing the cost and time of international money transfers by eliminating intermediary banks.

When Blockchain Does Not Make Sense

Single organization controls the data: If one company owns the database and all users trust that company, a traditional database (PostgreSQL, DynamoDB) is simpler, faster, cheaper, and more mature. Blockchain's value comes from decentralization. If you do not need decentralization, you do not need blockchain.
Performance matters more than trust: A traditional database handles thousands to millions of transactions per second. Most blockchains handle tens to hundreds. If your system needs high throughput and low latency (an e-commerce platform, a social media feed), blockchain cannot compete with a well-designed traditional architecture.
Data needs to be modified or deleted: Blockchain is append-only by design. GDPR's "right to be forgotten" (delete my data on request) conflicts fundamentally with blockchain's immutability. If your system needs to delete or modify historical records, blockchain works against you.
The "blockchain" is just a shared database: Many enterprise "blockchain" projects are permissioned networks with a handful of known participants, managed by a central consortium, with admin privileges to modify the chain. At that point, a shared database with cryptographic audit logging and access controls provides the same guarantees with far less complexity.
All participants already trust each other: If a group of banks already trust each other through legal contracts and regulatory oversight, adding blockchain does not add trust. It adds infrastructure complexity.

The Decision Framework

Ask these four questions before choosing blockchain:

Do multiple mutually distrusting parties need to share data without a central authority? If no, use a traditional database.

Is tamper-evidence of historical records a hard requirement? If no, cryptographic audit logs on a traditional database provide similar guarantees with less overhead.

Does eliminating intermediaries provide meaningful value (cost, speed, access)? If no, the intermediary is doing its job, and blockchain adds complexity without proportional benefit.

Can the system tolerate blockchain's throughput and latency limitations? If no, blockchain cannot meet your performance requirements regardless of its other benefits.

If the answer to all four questions is yes, blockchain is likely the right choice. If any answer is no, a traditional architecture is almost certainly better.

Scenario	Blockchain?	Why or Why Not
Cross-border payments between banks	Possibly	Reduces intermediaries and settlement time, but banks may prefer private networks
Internal company inventory system	No	Single organization, no trust issue, traditional DB is simpler and faster
Multi-company supply chain tracking	Yes	Multiple distrusting parties, tamper-evidence needed, no single authority
Social media platform	No	Single company controls data, needs high throughput, needs content deletion
Digital voting system	Possibly	Tamper-evidence and transparency are critical, but scalability and privacy are challenges
Decentralized lending/borrowing	Yes	Eliminates intermediary, trustless execution via smart contracts, value in decentralization

Beginner Mistake to Avoid

New engineers sometimes propose blockchain in system design interviews because it sounds technically impressive.

"We should use blockchain for our user profile data" is a red flag, not a good answer.

Blockchain adds latency, reduces throughput, increases infrastructure complexity, and solves a specific problem (trustless decentralization) that most applications do not have.

If the interviewer asks about blockchain, demonstrate that you understand both its strengths and its limitations. Showing that you can identify when not to use a technology is as valuable as showing that you can design with it.

Interview-Style Question

Q: A healthcare consortium of 15 hospitals wants to share patient records so that any hospital in the network can access a patient's history when they arrive for treatment. They propose using blockchain. Is this the right approach?

A: It depends on the trust model, but blockchain is likely not the best primary architecture here. The core question is whether the 15 hospitals trust a central authority to manage the shared database. If the consortium can agree on a governing body (or a rotating administrator), a traditional shared database with cryptographic audit logging, strong access controls, and HIPAA-compliant encryption provides higher throughput, lower latency (critical for emergency room scenarios), and support for data deletion (required by regulations). Where blockchain might add value is as a secondary audit layer: record hashes of access logs and data modifications on a permissioned blockchain so that no single hospital can tamper with the audit trail. This gives you blockchain's tamper-evidence for the audit trail while keeping the primary data access on a performant, traditional system. Full blockchain for the patient records themselves is impractical: HIPAA requires the ability to modify and delete records, blockchain is append-only, and the throughput of blockchain systems is orders of magnitude lower than what a hospital network needs for real-time patient data access.

KEY TAKEAWAYS

PoW secures the network through computational cost but wastes enormous energy. PoS secures through economic stake with 99.9% less energy. BFT provides fast, final consensus for known validator sets.
Blockchain is an append-only distributed ledger maintained across nodes without a central authority. Merkle trees enable efficient verification. Full replication ensures no single point of failure.
Smart contracts are deterministic programs on the blockchain that execute automatically. DApps combine on-chain smart contracts with off-chain backends and traditional frontends.
Blockchain makes sense when multiple distrusting parties need shared, tamper-evident data without a central authority. It does not make sense for single-organization systems, high-throughput requirements, or data that needs modification or deletion.
Permissioned blockchains sacrifice decentralization for performance and are appropriate for enterprise consortiums. Permissionless blockchains provide full decentralization for public, censorship-resistant applications.
The strongest signal of system design maturity is knowing when not to use a technology. Blockchain solves a narrow class of problems extremely well. For everything else, traditional architectures are simpler, faster, and cheaper.