Chapter 8: System Design Interview Mastery

8.20 Design a social graph (LinkedIn connections)

1. Problem Definition and Scope

We are designing the backend system for the "Connections" feature on LinkedIn. This system manages the professional social graph, tracking who is connected to whom.

Unlike Twitter (where relationships are one-way "follows"), LinkedIn connections are bidirectional (A connects to B means B connects to A) and require mutual approval.

Main User Groups:

  • Professionals: Sending invites, accepting requests, and browsing their network.

  • Downstream Systems: The Feed, Search, and Recommendation engines need to query this graph (e.g., "Show posts from my connections").

Scope:

  • In Scope: Sending/Accepting/Ignoring requests, viewing the connection list, removing connections, and checking connection status (e.g., for profile buttons).

  • Out of Scope: The Feed generation, Messaging, User Profiles (bio, work history), and complex "People You May Know" algorithms (2nd/3rd degree traversal).

2. Clarify functional requirements

Must Have:

  • Send Invite: User A can send a connection request to User B.

  • Accept/Ignore: User B can respond to the request.

  • List Connections: User A can view their connections, sorted by "most recently added."

  • Check Status: The system must quickly return the relationship status (Connected, Pending, None) for a target user (critical for profile rendering).

  • Disconnect: User A can remove User B from their network.

Nice to Have:

  • Connection Cap: Enforce a limit (e.g., 30,000 connections) to prevent abuse.

  • Mutual Connection Count: Show "5 mutual connections" on profiles.

Functional Requirements

3. Clarify non-functional requirements

  • Users: 1 Billion total users. ~300 Million Daily Active Users (DAU).
  • Scale:
    • Average connections: 500.
    • Max connections: 30,000.
    • Total edges: ~500 Billion (1B users × 500 avg).
  • Read vs. Write: Extremely Read Heavy.
    • Writes: Occasional (sending/accepting invites).
    • Reads: Every profile view, feed load, and search result requires checking connection status. Ratio is likely 1000:1.
  • Latency:
    • Reads: < 20ms (p99) to ensure pages load fast.
    • Writes: < 200ms is acceptable.
  • Consistency:
    • Strong Consistency for the relationship status (if I accept, we both see "Connected" immediately).
    • Eventual Consistency is fine for aggregate counts or "friends of friends" queries.

Non-Functional Requirements

4. Back of the envelope estimates

Traffic (QPS):

  • Reads: Assume 300M DAU. Each user loads 50 pages/day (feed, profiles, search). Each page checks status for ~20 people.
    • $300M \times 50 \times 20 = 300 \text{ Billion checks/day}$.
    • $300B / 86,400 \approx 3.5 \text{ Million QPS}$.
    • Conclusion: We need aggressive caching; a database cannot handle this raw load.
  • Writes: Assume 10M connections/invites per day.
    • $10M / 86,400 \approx 115 \text{ Writes/sec}$.
    • Conclusion: Write throughput is negligible.

Storage:

  • Total Edges: 500 Billion.

  • Strategy: To optimize reads, we store edges bidirectionally (store A→B and B→A). Total rows: 1 Trillion.

  • Row Size: user_id (8B) + target_id (8B) + status (1B) + timestamp (8B) + overhead ≈ 50 Bytes.

  • Total Storage: $1 \text{ Trillion} \times 50 \text{ Bytes} \approx 50 \text{ TB}$.

  • Conclusion: We must use Database Sharding.

Back-of-the-envelope estimation

5. API design

We will use a REST API with JSON responses.

1. Send Connection Request

  • POST /v1/connections/requests
  • Body: { "target_id": "u456", "note": "Hi, let's connect." }
  • Response: 201 Created { "request_id": "req_1", "status": "PENDING" }

2. Respond to Request

  • PUT /v1/connections/requests/{request_id}
  • Body: { "action": "ACCEPT" } (or "IGNORE")
  • Response: 200 OK { "status": "CONNECTED" }

3. List Connections

  • GET /v1/users/{user_id}/connections
  • Query Params: page_token=abc, limit=20
  • Response:

    {  
      "connections": \[  
        { "user\_id": "u999", "connected\_at": 1700000000 },  
        { "user\_id": "u888", "connected\_at": 1690000000 }  
      \],  
      "next\_page\_token": "def"  
    }

4. Check Status (Batch)

  • POST /v1/connections/status/batch
  • Body: { "target_ids": ["u1", "u2", "u3"] }
  • Response: { "u1": "CONNECTED", "u2": "NONE", "u3": "PENDING_OUTGOING" }

6. High level architecture

Component Roles:

  • Connection Service: Stateless Go/Java service. Handles validation, limits, and privacy checks.
  • Redis Cluster: Caches the "Adjacency List" (friend list) for active users to serve the 3.5M read QPS.
  • Sharded SQL: Stores the 50TB of truth data.
  • Kafka: Decouples the "Accept" action from side effects (e.g., notifying User B, updating the Search Index).

High-level Architecture

7. Data model

We will use Sharded MySQL/Postgres.

We use the Adjacency List pattern. To ensure fast O(1) lookups for "My Connections," we denormalize by storing every relationship twice: once for User A and once for User B.

Table: edges

  • user_id (BigInt, Partition Key)
  • target_id (BigInt)
  • status (Enum: PENDING, ACCEPTED)
  • created_at (Timestamp)
  • Primary Key: (user_id, target_id)
  • Index: (user_id, status, created_at) -> Optimized for "Show my recent connections."

Sharding Strategy:

  • We shard by user_id.
  • This means all of User A's connections live on Shard X.
  • Querying "Get A's connections" is always a single-shard query.

8. Core flows end to end

In a high-scale social graph, the complexity lies in maintaining consistency across shards during writes and ensuring low latency during reads. We will detail the two most critical flows: Accepting a Connection (Write) and Viewing the Network (Read).

Flow 1: Accepting a Request (The Distributed Write Path)

Since we shard by user_id, User A (the requester) and User B (the accepter) likely reside on different database shards. We cannot use a standard ACID transaction across shards without incurring a massive performance penalty (Two-Phase Commit). Instead, we use a Saga Pattern relying on local transactions and asynchronous event propagation.

Scenario: User B clicks "Accept" on User A's connection request.

  1. Request Handling:
    • The client sends PUT /connections/requests/{req_id} with action ACCEPT.
    • The Connection Service receives the request.
  2. Local Transaction (Primary Action):
    • The service identifies User B's shard (using the Directory Service/consistent hashing).
    • It executes a Local ACID Transaction on Shard B:
      • UPDATE edges SET status = 'ACCEPTED' WHERE user_id = B AND target_id = A;
      • If this fails, the error is returned immediately to the user.
  3. Event Publication (Async Propagation):
    • Upon successful commit to Shard B, the service publishes an event to Kafka:
      • Topic: connection-events
      • Payload: { "type": "ACCEPTED", "from_user": B, "to_user": A, "timestamp": 171000... }
    • User Experience: We return 200 OK to User B immediately. The UI updates to show "Connected." We do not wait for User A's shard to update.
  4. Event Consumption (Secondary Action):
    • A Connection Worker consumes the event from Kafka.
    • It identifies User A's shard.
    • It executes a Local ACID Transaction on Shard A:
      • UPDATE edges SET status = 'ACCEPTED' WHERE user_id = A AND target_id = B;
  5. Cache Invalidation:
    • The worker deletes (or updates) the Redis keys connections:A and connections:B to force a refresh on the next read.
  6. Side Effects:
    • The Notification Service consumes the same Kafka event to send a push notification to User A ("User B accepted your request").
    • The Search Service consumes the event to update the search index (allowing A to find B in "My Connections" search).

Edge Case - Consistency Repair: If Step 4 fails (e.g., Shard A is down), the message remains in Kafka and is retried. If it fails permanently (poison pill), it goes to a Dead Letter Queue (DLQ). The nightly "Anti-Entropy" job acts as the final safety net to ensure A->B matches B->A.

Flow 1

Flow 2: Viewing Connections (The Read-Aside Path)

This flow is extremely read-heavy (3.5M QPS). We rely heavily on the Redis Cluster using the Cache Aside pattern.

Scenario: User A opens their "My Network" page.

  1. Check Cache:
    • The Connection Service checks Redis: ZREVRANGE connections:A 0 20 WITHSCORES.
    • Hit: If the list exists, we get a list of target_ids (e.g., [u101, u205, u300]). Proceed to Step 4.
  2. Cache Miss (Fallback to DB):
    • If Redis returns nil, the service queries the database.
    • Since we shard by user_id, this is a Single-Shard Query (highly efficient):
    • SQL
        SELECT target\_id, created\_at  
        FROM edges  
        WHERE user\_id \= 'A' AND status \= 'ACCEPTED'  
        ORDER BY created\_at DESC  
        LIMIT 500;
    
    
  • Note: We fetch up to 500 IDs to warm the cache, even if the user only requested 20.
  1. Populate Cache:
    • The service pushes these IDs into a Redis Sorted Set (ZADD connections:A).
    • TTL: Set a Time-To-Live (e.g., 24 hours) to evict inactive users' graphs from memory.
  2. Data Hydration (The N+1 Solution):
    • The Connection Service now has a list of raw IDs: [u101, u205, u300]. The UI needs Names and Avatar URLs.
    • The service calls the Profile Service via a Batch/Bulk API: POST /profiles/batch { ids: [u101, u205, u300] }.
    • Optimization: This is a parallelized RPC call (or uses a dedicated Memcached layer for profiles).
  3. Response:
    • The service aggregates the connection data with the profile data and returns the JSON response to the client.

Flow 2

Flow 3: Bulk Status Check (Profile Rendering)

This flow runs every time a user views a feed or search results. We need to know: "Is the author of this post my connection?"

Scenario: User A views a Search Result page with 10 different people.

  1. Batch Request: The client (or Search Aggregator) sends a list of 10 IDs to the Connection Service.
  2. Pipeline Cache Lookup:
    • The service uses Redis Pipelining to execute 10 commands in one network round-trip:
      • ZSCORE connections:A {user_1}
      • ZSCORE connections:A {user_2}
      • ...
  3. Interpretation:
    • If ZSCORE returns a timestamp -> Status: CONNECTED.
    • If nil -> Check pending_requests:A cache (small secondary cache).
    • If not found -> Status: NOT_CONNECTED.
  4. Result: This entire batch check completes in sub-millisecond time due to the in-memory O(1) nature of Redis ZSETs.

Flow 3

9. Caching and read performance

Cache Structure:

We use Redis Sorted Sets (ZSET).

  • Key: connections:{user_id}
  • Member: target_id
  • Score: timestamp
  • Why ZSET?
    • Pagination: We can use ZREVRANGE to get the top 20 connections efficiently.
    • Existence: We can use ZSCORE to check "Is A connected to B?" in O(1).

Batch Status Check Strategy:

For POST /batch_status (checking 20 users at once):

  1. Pipeline: Use Redis Pipelining to check ZSCORE for all 20 pairs in one network round trip.
  2. Result:
    • If ZSCORE returns a number: Status is CONNECTED.
    • If null: Check the pending_requests cache or DB.

10. Storage, indexing and media

Storage:

  • Database: PostgreSQL on high-performance SSDs.
  • Replication: Each Shard has 1 Primary (Writes) and 3 Replicas (Reads).
  • Indexes: Clustered Index on (user_id, target_id) makes existence checks and random deletes fast.

Media:

  • We do not store images here. We store only user_id. The client resolves images via the Profile Service / CDN.

11. Scaling strategies

1. Database Sharding:

  • We use a Directory Service (like ZooKeeper) to map user_id to Shard ID.
  • As we grow from 1B to 2B users, we add more shards and run a background migration to rebalance users.

2. Handling "Hot" Users:

  • A "Super Connector" (max 30k connections) is not a storage hotspot (30k rows is small).
  • The hotspot is Reading. If a celebrity is viewed 10,000 times/sec, we rely on the Redis Cache.
  • If Redis is overwhelmed, we can use Local Caching (in-memory inside the Connection Service) for the hottest keys with a short TTL (5 seconds).

3. Anti-Entropy (Consistency Repair):

  • Since we use double-writes (A->B and B->A), data can drift if a write fails halfway.
  • We run a nightly Repair Job that scans the DB: If A->B exists, B->A must exist. If one is missing, it repairs the data.

12. Reliability, failure handling and backpressure

  • Circuit Breakers: If a DB Shard is slow, the API wraps calls in a circuit breaker. If it trips, we return a degraded response (e.g., empty connection list) instead of hanging.

  • Throttling: We rate-limit writes per user (e.g., max 100 invites/day) to prevent spam and write spikes.

  • Graceful Degradation: If the Redis cluster fails, we can fall back to the DB Replicas, but we strictly rate-limit this fallback to prevent crashing the DBs.

13. Security, privacy and abuse

  • Authorization: The API must strictly validate that the user_id in the token matches the user_id in the URL for private operations.

  • Privacy Settings: Before returning the connection list of User B to User A, check User B's privacy setting (can_view_connections).

  • Abuse Prevention: If User A sends many invites that are "Ignored" or "Marked as Spam," we lower their "Reputation Score" and block them from sending further invites.

14. Bottlenecks and next steps

  • Bottleneck: 2nd Degree Queries.
    • Finding "Friends of Friends" (e.g., "How many connections do I share with Bill Gates?") is very expensive in Sharded SQL because it requires querying all of Bill Gates' friends' shards.

    • Next Step: Build a dedicated Graph Service (using Neo4j or a custom in-memory graph engine). We would ETL data from our SQL DB to this Graph DB to serve recommendation/path queries.

  • Bottleneck: N+1 Problem on Enrichment.
    • Hydrating names/avatars for the connection list involves calling the Profile Service.
    • Next Step: Denormalize a small "snapshot" (name, avatar hash) into the edges table to avoid the external call for the list view, updating it asynchronously when the user changes their profile.

Summary

  1. Architecture: Read-heavy system using Redis ZSETs for caching and Sharded SQL for storage.

  2. Data Model: Bidirectional adjacency list sharded by user_id to optimize for local reads.

  3. Scale: Handles 50TB of data and 3.5M QPS via horizontal partitioning and aggressive caching.

  4. Reliability: Uses double-writes with async repair jobs to ensure graph consistency.