8.6 Design a photo/video sharing platform (Instagram) - The System Design Interview Handbook

1. Restate the Problem and Pick the Scope

We are designing a photo and video sharing social platform, similar to Instagram, where users can upload media, follow other users, and browse a personalized feed of posts from the people they follow.

The system needs to handle large volumes of media uploads, serve a highly personalized home feed, and support social interactions like likes and comments.

Main user groups and actions:

Content creators -- upload photos or short videos with captions, and receive engagement (likes, comments) from followers.
Content consumers -- scroll through a personalized home feed, discover new content, and interact with posts.
Social participants -- follow/unfollow other users, like posts, and leave comments.

Scope decisions:

We will focus on three core features: media upload (photos and short videos), the home feed (timeline of posts from followed users), and basic social interactions (follow, like, comment).
We will NOT cover: Stories, Reels, direct messaging, live streaming, ads/monetization, Explore/recommendation algorithms, shopping features, or account verification. These are all important but can be layered on later.

2. Clarify Functional Requirements

Must-Have Features

A user can upload a photo (JPEG/PNG, up to 20 MB) or a short video (up to 60 seconds, up to 100 MB) with a text caption.
Each post is stored permanently and assigned a unique URL.
A user can follow and unfollow other users.
A user can view a home feed showing recent posts from the users they follow, sorted roughly by recency (with some ranking).
A user can like a post and see the total like count.
A user can comment on a post and view the list of comments.
A user has a profile page showing all their posts in reverse chronological order.

Nice-to-Have Features

Notifications when someone likes or comments on your post.
Hashtags and the ability to search posts by hashtag.
Multiple photos per post (carousel).

Functional Requirements

3. Clarify Non-Functional Requirements

Metric	Assumption / Target
Monthly active users (MAU)	500 million
Daily active users (DAU)	200 million
Read:Write ratio	~100:1 -- overwhelmingly read-heavy (feed views >> uploads)
Average posts per creator per day	~0.5 (not every user posts daily; ~10% of DAU post)
Average follows per user	200
Feed read latency	< 200 ms p99 -- users expect a snappy scroll experience
Upload latency	< 2 seconds p99 -- acceptable for a media upload
Availability	99.99% (four nines) -- the feed is the core product
Consistency	Eventual consistency is acceptable for feed, like counts, and follower counts; strong consistency for the upload path (a post must be visible after upload completes)
Data retention	Permanent for posts; media stored indefinitely

Non-Functional Requirements

4. Back-of-the-Envelope Estimates

Write QPS (post uploads)

10% of 200M DAU post, averaging 0.5 posts/day.

Writers/day = 200M × 10% = 20 million
Posts/day = 20M × 0.5 = 10 million
Write QPS = 10M / 86,400 ≈ 116 QPS (average)
Peak QPS = 116 × 5 ≈ ~580 QPS

Read QPS (feed loads)

Each DAU opens the app ~10 times/day and loads the feed.

Feed reads/day = 200M × 10 = 2 billion
Read QPS = 2B / 86,400 ≈ 23,000 QPS (average)
Peak read QPS = 23,000 × 3 ≈ ~70,000 QPS

70K QPS for feed reads is substantial. Caching and pre-computation are essential.

Storage

Photos: Assume 80% of posts are photos, average 2 MB after compression and multiple resolutions.

Photo storage/day = 8M × 2 MB = 16 TB/day

Videos: 20% of posts are videos, average 30 MB after transcoding.

Video storage/day = 2M × 30 MB = 60 TB/day
Total media/day = ~76 TB/day
Total media/year = 76 TB × 365 ≈ ~28 PB/year

Post metadata (caption, timestamps, user ID, media URLs): ~1 KB per post.

Metadata/year = 10M/day × 365 × 1 KB ≈ ~3.6 TB/year

Metadata is small compared to media. Media storage dominates costs.

Bandwidth

Serving feed images to users. Assume each feed load shows 10 posts, each with a 200 KB thumbnail.

Bandwidth per feed load = 10 × 200 KB = 2 MB
Peak bandwidth = 70,000 QPS × 2 MB = 140 GB/s

This is enormous. A CDN is absolutely critical.

Back-of-the-envelope Estimation

5. API Design

5.1 Upload a Post

Field	Value
Method & Path	`POST /api/v1/posts`
Request	Multipart form: `media` (file), `caption` (text), `media_type` ("photo" or "video")
Success (201 Created)	`{ "post_id": "abc123", "media_url": "https://cdn.insta.io/...", "created_at": "..." }`
Error codes	400 -- invalid file type or too large; 401 -- not authenticated; 429 -- rate limited

5.2 Get Home Feed

Field	Value
Method & Path	`GET /api/v1/feed?cursor={cursor}&limit=20`
Success (200 OK)	`{ "posts": [ { "post_id", "user_id", "username", "caption", "media_url", "like_count", "comment_count", "created_at" } ... ], "next_cursor": "..." }`
Error codes	401 -- not authenticated

5.3 Like a Post

Field	Value
Method & Path	`POST /api/v1/posts/{post_id}/likes`
Success (201)	`{ "like_count": 4837 }`
Error codes	404 -- post not found; 409 -- already liked

5.4 Comment on a Post

Field	Value
Method & Path	`POST /api/v1/posts/{post_id}/comments`
Request	`{ "text": "Nice photo!" }`
Success (201)	`{ "comment_id": "...", "text": "...", "created_at": "..." }`
Error codes	400 -- empty text; 404 -- post not found

5.5 Follow a User

Field	Value
Method & Path	`POST /api/v1/users/{user_id}/follow`
Success (200)	`{ "following": true }`
Error codes	404 -- user not found; 409 -- already following

5.6 Get User Profile and Posts

Field	Value
Method & Path	`GET /api/v1/users/{user_id}/posts?cursor={cursor}&limit=20`
Success (200)	`{ "user": { "username", "bio", "follower_count", "post_count" }, "posts": [...], "next_cursor": "..." }`

6. High-Level Architecture

High-level Architecture

Component Responsibilities

CDN -- serves all media files (photos, videos, thumbnails) from edge locations worldwide. Handles the bulk of bandwidth (~140 GB/s peak). Without a CDN, our origin servers would collapse.
Load Balancer -- distributes API requests across service instances, terminates TLS, performs health checks.
Post Upload Service -- accepts media uploads, stores the original file in object storage, writes post metadata to the database, and enqueues a media processing job.
Media Processing Workers -- consume jobs from the queue. For photos: generate thumbnails at multiple resolutions (150px, 640px, 1080px). For videos: transcode to multiple bitrates (360p, 720p, 1080p) and generate a poster frame. Results are written back to object storage.
Post Metadata DB (PostgreSQL) -- stores post records (post_id, user_id, caption, media URLs, timestamps, counters). The source of truth for all post data.
Feed Service -- the hot read path. Retrieves a user's pre-computed feed from Redis. If the cache is cold, it falls back to computing the feed on the fly from the database.
Feed Cache (Redis) -- stores each user's home feed as a sorted list of post IDs. Updated by the fan-out service whenever someone they follow publishes a new post.
Social Service -- handles follow/unfollow, likes, and comments. Manages the social graph and interaction data.
Fan-out Service -- when a new post is created, this background service pushes the post ID into the feed cache of every follower. This is the most compute-intensive background job.

7. Data Model

Database Choice

We use a combination:

PostgreSQL for structured metadata (posts, users, comments) -- we benefit from ACID guarantees, rich queries, and joins.
Redis for the feed cache and counters -- fast sorted sets for feed lists, fast atomic increments for like/follower counts.
A graph-friendly store (or PostgreSQL with good indexing) for the follow relationship -- the social graph is a many-to-many relationship queried in both directions (who do I follow? who follows me?).

Table: users

Column	Type	Notes
user_id	BIGINT PK	Snowflake ID
username	VARCHAR(30)	UNIQUE index
bio	TEXT	Profile description
profile_pic_url	VARCHAR(500)	Points to CDN
follower_count	BIGINT	Denormalized, updated async
following_count	BIGINT	Denormalized
created_at	TIMESTAMP

Table: posts

Column	Type	Notes
post_id	BIGINT PK	Snowflake ID, also used for time ordering
user_id	BIGINT	Index -- powers user profile page query
caption	TEXT
media_type	ENUM	'photo' or 'video'
media_urls	JSONB	`{ "thumb": "...", "medium": "...", "original": "..." }`
like_count	BIGINT	Denormalized counter
comment_count	BIGINT	Denormalized counter
created_at	TIMESTAMP	Index -- powers time-based queries

Table: follows

Column	Type	Notes
follower_id	BIGINT	Composite PK (follower_id, followee_id)
followee_id	BIGINT	Index on followee_id for "who follows me?"
created_at	TIMESTAMP

Table: likes

Column	Type	Notes
user_id	BIGINT	Composite PK (user_id, post_id) -- prevents double likes
post_id	BIGINT	Index for "who liked this post?"
created_at	TIMESTAMP

Table: comments

Column	Type	Notes
comment_id	BIGINT PK	Snowflake ID
post_id	BIGINT	Index -- fetch comments for a post
user_id	BIGINT	Who commented
text	TEXT
created_at	TIMESTAMP

Feed Cache in Redis

Each user's home feed is a Redis sorted set:

Key: feed:{user_id}
Value: sorted set of post_ids, scored by timestamp

The feed service reads the top N post IDs from this sorted set, then fetches the full post metadata from PostgreSQL (or a post cache).

8. Core Flows - End to End

Flow 1: Upload a Photo Post

This is what happens when a user takes a photo, writes a caption, and taps "Share."

Step 1 -- Client uploads the media. The mobile app sends a multipart POST to /api/v1/posts containing the photo file and the caption. The request hits the load balancer, which routes it to a Post Upload Service instance.
Step 2 -- Upload the original photo to object storage. The service immediately streams the raw photo file to S3 (object storage) at a key like originals/{post_id}.jpg. This happens before any processing. For a 5 MB photo, this takes about 200-500 ms. The service now has a URL for the original file.

Why upload the original first? We want to persist the user's data as quickly as possible. All resizing and thumbnail generation happens asynchronously. The user should not wait for image processing.
Step 3 -- Write post metadata to PostgreSQL. The service generates a Snowflake post_id, inserts a row into the posts table with the caption, user_id, media_type, and the original media URL. The media_urls field initially contains only the original URL; thumbnail URLs are added after processing.
Step 4 -- Enqueue a media processing job. The service publishes a message to the media processing queue (Kafka or SQS): { post_id, original_url, media_type: "photo" }. This is a fire-and-forget write.
Step 5 -- Return success to the client. The service responds with HTTP 201, including the post_id and a temporary media URL (the original, served through the CDN). The user sees their post appear on their profile immediately. Total time: ~500 ms to 1 second.
Step 6 -- Media workers process the photo (async). A pool of media processing workers consumes the job from the queue. Each worker downloads the original photo from S3, generates thumbnails at three resolutions (150px square, 640px wide, 1080px wide), compresses them, and uploads them back to S3 at keys like thumbnails/{post_id}_150.jpg. The worker then updates the media_urls JSONB field in the posts table with the new thumbnail URLs. This takes 2-10 seconds but happens entirely in the background.
Step 7 -- Fan-out to followers' feeds (async). After the post metadata is written, the fan-out service picks up the event. It looks up all followers of the post's author from the follows table. For each follower, it adds the post_id (scored by timestamp) to that follower's feed sorted set in Redis: ZADD feed:{follower_id} {timestamp} {post_id}. It also trims the sorted set to keep only the latest 500 posts to bound memory usage.

Fan-out on write vs. fan-out on read: We use fan-out on write (push model) for most users. When a user with 500 followers posts, we make 500 Redis writes. This is fast and means the feed is pre-computed when followers open the app. However, for celebrity users with millions of followers, we use fan-out on read instead (see Section 11).
What the user sees: The post appears on their profile within 1 second. Followers see it in their feed within a few seconds (after fan-out completes). Initially the photo may be slightly lower quality (the original, not the optimized thumbnail), but the CDN starts serving the processed thumbnails within 10-15 seconds.

Flow 1

Flow 2: Load the Home Feed (the Hot Path)

This is the most frequent and performance-critical operation. A user opens the app and scrolls their feed.

Step 1 -- Client requests the feed. The app sends GET /api/v1/feed?limit=20 to the load balancer, which routes to a Feed Service instance.
Step 2 -- Fetch post IDs from the feed cache. The service reads from Redis: ZREVRANGE feed:{user_id} 0 19 (top 20 post IDs by timestamp, newest first). This returns an ordered list of post_ids in under 1 ms.
- Cache hit (common case): The sorted set exists and has data. Proceed to Step 3.
- Cache miss (cold start): If the user's feed is not in Redis (new user, cache eviction, or restart), the service falls back to computing the feed on the fly. It queries the follows table for the user's followee list, then queries the posts table: SELECT * FROM posts WHERE user_id IN (...followees) ORDER BY created_at DESC LIMIT 20. This is slower (50-200 ms) but correct. The result is written back to Redis for future requests.
Step 3 -- Hydrate the post IDs into full post objects. The feed cache stores only post_ids (to save memory). The service now needs the full post data (caption, media URLs, username, like count). It does a multi-get from a post cache (Redis hash or Memcached): MGET post:{id1} post:{id2} ... post:{id20}. For cache hits, this returns full post JSON in under 1 ms. For misses, the service batch-reads from PostgreSQL: SELECT * FROM posts WHERE post_id IN (...), populates the cache, and merges the results.
Step 4 -- Assemble and return the response. The service builds the feed response JSON with all 20 posts (including media CDN URLs, captions, like/comment counts, and the author's username and profile picture). It returns this to the client with a next_cursor for pagination.
Step 5 -- Client renders the feed. The app receives the JSON and starts rendering. For each post, it requests the thumbnail image from the CDN URL. The CDN either serves it from an edge cache (fast, ~10-20 ms) or pulls it from S3 origin on the first request. The user sees posts appearing on screen within 100-200 ms of opening the app. Images load progressively as they scroll.
Total latency breakdown:
- Feed cache lookup: ~1 ms
- Post hydration (cache): ~2-5 ms
- Network to client: ~20-50 ms
- Image loading (CDN): ~30-100 ms per image (parallel)
- Total perceived: ~150-300 ms for the feed to appear with images

Flow 2

Flow 3: Like a Post

This is a simple but high-frequency interaction.

Step 1 -- Client sends the like request. The user double-taps a photo. The app sends POST /api/v1/posts/{post_id}/likes to the load balancer, routed to the Social Service.
Step 2 -- Optimistic UI update. Before waiting for the server response, the app immediately shows the heart animation and increments the like count locally. This makes the interaction feel instant. If the server later fails, the app rolls back silently.
Step 3 -- Check for duplicate likes. The service checks the likes table: SELECT 1 FROM likes WHERE user_id = ? AND post_id = ?. The composite primary key makes this a fast index lookup. If the like already exists, return 409. Otherwise proceed.
Step 4 -- Write the like and increment the counter. In a single transaction (or two fast operations):
- Insert into likes table: INSERT INTO likes (user_id, post_id, created_at) VALUES (...).
- Increment the denormalized counter: UPDATE posts SET like_count = like_count + 1 WHERE post_id = ?.
  
  For very high-traffic posts (celebrity posts getting thousands of likes per second), incrementing a single row becomes a hot spot. We address this with batched counter updates (see Section 11).
Step 5 -- Invalidate/update the post cache. The service updates the cached post object in Redis to reflect the new like count: HINCRBY post:{post_id} like_count 1. This ensures the next feed load shows the correct count without a database read.
Step 6 -- (Async) Enqueue a notification. The service publishes a notification event to a queue: { type: "like", from_user: ..., post_id: ..., post_owner: ... }. A notification worker picks this up and delivers a push notification to the post owner. This is entirely async and never blocks the like response.
Step 7 -- Return success. The service responds with HTTP 201 and the new like count. Total server time: ~10-20 ms. The user already saw the heart animation in Step 2, so the experience feels instant.

Flow 3

9. Caching and Read Performance

What We Cache

Home feed (Redis sorted set): feed:{user_id} -- a list of post_ids for each user's personalized feed. This is the most critical cache. Without it, every feed load would require querying the follows table and then the posts table.
Post objects (Redis hash or string): post:{post_id} -- the full post metadata (caption, media URLs, counts, author info). Hydrates the feed.
User profiles (Redis hash): user:{user_id} -- username, profile pic URL, follower/following counts. Used when rendering posts in the feed.

Where the Cache Sits

Redis sits between the Feed/Social services and PostgreSQL. For the feed path: Feed Service checks Redis first, falls back to Postgres, then populates Redis.

Cache Keys and Value Shape

feed:{user_id} -> Sorted set of post_ids (scored by timestamp)
Max 500-800 entries per user (trimmed)

post:{post_id} -> Hash { user_id, caption, media_urls, like_count,
comment_count, created_at }
TTL: 24 hours (refreshed on access)

user:{user_id} -> Hash { username, profile_pic_url, follower_count }
TTL: 1 hour

Cache Update and Invalidation

Feed cache: Updated by the fan-out service on every new post (ZADD). Trimmed to 500 entries to cap memory. Never explicitly invalidated; entries age out naturally as newer posts push them out.
Post cache: Updated on like/comment (HINCRBY). Populated lazily on cache miss. TTL-based expiry (24 hours). When a post is updated (e.g., caption edit), the service deletes the cache entry and lets the next read repopulate it.
User cache: Populated lazily. Invalidated on profile update.

Eviction Policy

LRU (Least Recently Used) on the Redis instance level. Active users' feeds stay warm. Inactive users' feeds get evicted and recomputed on their next login. This is acceptable because the recomputation cost is a one-time penalty of ~100-200 ms.

10. Storage, Indexing, and Media

Primary Data Storage

PostgreSQL for all structured data (users, posts, follows, likes, comments). Sharded by user_id at scale (see Section 11).
Redis for feed cache, post cache, and counters.
S3 (Object Storage) for all media files -- originals, thumbnails, transcoded videos.

Indexes

posts table: Index on (user_id, created_at DESC) -- powers the user profile page (all posts by a user, newest first). Index on post_id (primary key) -- powers single post lookup.
follows table: Composite primary key (follower_id, followee_id) -- powers "am I following this user?" and "who do I follow?". Index on followee_id -- powers "who follows me?" (needed for fan-out).
likes table: Composite primary key (user_id, post_id) -- prevents duplicates and answers "did I like this post?". Index on post_id -- powers "who liked this post?".
comments table: Index on (post_id, created_at) -- fetches comments for a post in order.

Media Storage and CDN

Upload path: Original media goes directly to S3. Media workers generate thumbnails/transcodes and store them in S3 under predictable keys.

Serving path: All media URLs in the API response point to the CDN (e.g., https://cdn.insta.io/thumbnails/abc123_640.jpg). The CDN operates in pull-based mode: on the first request for an image, the CDN fetches it from S3, caches it at the edge, and serves all subsequent requests from cache. Popular images stay cached for days. For videos, the CDN supports byte-range requests for adaptive streaming.

Storage tiers: After 90 days, original high-resolution photos and source videos can be moved to cheaper infrequent-access storage (S3-IA or Glacier) since they are rarely accessed. Thumbnails stay in standard storage because they are always needed for the feed.

Trade-offs

Cost: S3 storage at ~28 PB/year is the largest cost center. CDN bandwidth at ~140 GB/s peak is the second. These are inherent to a media-heavy platform.
Write load on S3: ~580 QPS peak for original uploads plus worker writes. S3 handles this easily.
Read latency: CDN serves images in 10-50 ms from edge. S3 origin fetch is ~50-100 ms (only on CDN cache miss, which is rare for popular content).

11. Scaling Strategies

Version 1: Simple Setup

For the first deployment serving tens of thousands of users:

A single PostgreSQL instance for all tables.
A single Redis instance for feed cache.
A single S3 bucket for media.
Two app server instances (combined services) behind a load balancer.
A small pool of media processing workers.

Growing the System

Database replication: Add PostgreSQL read replicas. The feed service and profile page queries read from replicas. Writes (new posts, likes) go to the primary. Replication lag of a few hundred milliseconds is acceptable for feed reads.

Database sharding: At hundreds of millions of users, a single PostgreSQL primary cannot handle all writes. Shard by user_id:

The posts table is sharded by the post author's user_id. All posts by user X live on the same shard, which makes the profile page query efficient.
The follows table is sharded by follower_id. "Who do I follow?" is a single-shard query.
Routing: hash(user_id) % N_shards. A lightweight metadata service or consistent hashing ring maps user_ids to shards.

Fan-out optimization for celebrities: A user with 50 million followers cannot use fan-out on write (pushing a post to 50 million feed caches would take minutes and spike Redis writes). Instead, we use a hybrid approach:

Normal users (< 10K followers): Fan-out on write. When they post, push the post_id to all followers' feeds in Redis.
Celebrity users (> 10K followers): Fan-out on read. Do NOT push their posts to followers' feeds. Instead, when a user loads their feed, the feed service merges their pre-computed feed (from normal follows) with the latest posts from any celebrities they follow (queried on the fly). This adds ~10-20 ms to the feed read but avoids writing to millions of feed caches.

Separating read and write paths: The feed read path and the upload path have very different scaling needs (70K vs 580 QPS). We scale them independently -- many feed service instances, fewer upload service instances.

Handling Bursts

Media processing queue (Kafka): Absorbs spikes in uploads. If a million users post at midnight on New Year's Eve, the queue buffers the processing jobs and workers consume them at a steady rate. Uploads succeed immediately; thumbnails appear a few seconds later.
Like counter batching: For posts receiving thousands of likes per second, we batch counter updates. Instead of one DB write per like, a counter service accumulates likes in memory for 5 seconds, then issues a single UPDATE posts SET like_count = like_count + 347 WHERE post_id = ?. This reduces DB write load by 100x for hot posts.

12. Reliability, Failure Handling, and Backpressure

Removing Single Points of Failure

App servers: Multiple instances of each service behind the load balancer. Health checks remove unhealthy nodes.
PostgreSQL: Primary with synchronous streaming replication to a standby in a different AZ. Automatic failover via Patroni or managed service (RDS Multi-AZ).
Redis: Redis Cluster with replicas for automatic failover. If Redis is entirely down, the feed service falls back to on-the-fly computation from the database (slower, but the system keeps working).
S3: Provides 99.999999999% durability. It is replicated across AZs by default.
Kafka: Multi-broker cluster with replication factor 3. Can tolerate broker failures without losing messages.
Load balancer: Managed cloud ALB, inherently redundant across AZs.

Timeouts, Retries, and Idempotency

Timeouts: Feed service: 10 ms timeout on Redis, 200 ms on Postgres reads. Upload service: 5-second timeout on S3 uploads.
Retries with backoff: S3 upload retries up to 3 times (500 ms, 1s, 2s). Kafka produce retries are built in.
Idempotency: Post uploads use a client-generated idempotency key (sent in a header). If the client retries, the server checks if a post with that key already exists and returns the existing post instead of creating a duplicate. Likes use the composite primary key (user_id, post_id) as a natural idempotency mechanism.

Circuit Breakers

If the media processing pipeline falls behind (e.g., a surge of video uploads), a circuit breaker on the upload service stops enqueueing new processing jobs and returns a "processing delayed" status. The upload still succeeds (the original is in S3), but thumbnail generation is deferred.

Behavior Under Overload

Rate limiting: Per-user rate limits on uploads (e.g., 50 posts/day), likes (e.g., 500/hour), and comments. Prevents spam and protects the write path.
Shed non-essential work: Under extreme load, disable view-count tracking, notification delivery, and hashtag indexing. Feed reads and post uploads (the core experience) are never shed.
Degrade feed quality: If Redis and Postgres are both overloaded, serve a cached stale feed (even hours old) rather than returning an error. Users prefer slightly stale content over an error screen.

13. Security, Privacy, and Abuse

Authentication and Authorization

All API requests require a valid session token or JWT.
Token is issued after login (username/password or OAuth).
Authorization checks: only the post author can delete their post; only the account owner can modify their profile.

Encryption

In transit: HTTPS everywhere (TLS 1.3). Terminated at the load balancer.
At rest: S3 server-side encryption for all media. PostgreSQL disk encryption via managed service. Redis in-memory data is not encrypted at rest (acceptable for non-sensitive cached data).

Handling Sensitive Data

User passwords are hashed with bcrypt (high work factor).
Email addresses and phone numbers are stored encrypted and access-logged.
GDPR/privacy compliance: users can request account deletion, which triggers a cascade delete of all their posts, likes, comments, follows, and media from S3.

Abuse Protection

Rate limiting: Per-IP and per-user rate limits on all write endpoints.
Spam detection: Machine learning models scan captions and comments for spam, hate speech, and policy violations. Runs asynchronously after creation -- flagged content is hidden pending review.
Image/video scanning: Uploaded media is scanned (using services like AWS Rekognition or PhotoDNA) for prohibited content (CSAM, extreme violence). This runs in the media processing pipeline before thumbnails are made publicly accessible.
Report mechanism: Users can report posts and accounts. High-report-volume content is auto-hidden pending human review.

14. Bottlenecks and Next Steps

Main Bottlenecks and Risks

Fan-out for celebrities: A celebrity posting to 50 million followers is the hardest scalability challenge. Mitigation (already in design): Hybrid fan-out: push for normal users, pull for celebrities. Next step: Tune the threshold dynamically based on system load, not just follower count.
Media storage costs: 28 PB/year of media storage is extremely expensive. Mitigation: Tiered storage (move old originals to Glacier after 90 days). Next step: Intelligent compression, deduplication of identical uploads, and aggressive CDN caching to reduce origin reads.
Hot post counters: A viral post receiving millions of likes per minute creates a write hot spot on the like_count column. Mitigation (already in design): Batched counter updates via a counter service. Next step: Use Redis as the primary counter store for hot posts and sync to Postgres periodically.
Feed ranking complexity: Our current feed is sorted by timestamp, but users expect a ranked feed (showing the "best" posts first). Next step: Add a lightweight ranking service between the feed cache and the API. It re-ranks the cached post IDs using signals like engagement rate, recency, and user affinity. This can start as a simple heuristic and evolve into an ML-based system.
Database sharding complexity: Sharding by user_id makes cross-user queries (e.g., "all comments on this post" when comments come from users on different shards) harder. Next step: Use a separate comments service with its own database sharded by post_id, so all comments for a post live on one shard.

Design Summary

Aspect	Decision	Key Trade-off
Media storage	S3 + CDN, separate from metadata DB	Infinite scale and cheap storage, but adds a network hop (mitigated by CDN caching)
Feed delivery	Pre-computed feeds in Redis via fan-out on write	Fast reads (~1 ms), but expensive for high-follower users (mitigated by hybrid fan-out)
Fan-out strategy	Hybrid: push for normal users, pull for celebrities	Balances write amplification vs. read latency
Media processing	Async via message queue and workers	User sees the post immediately; thumbnails appear seconds later
Like/comment counters	Denormalized in posts table, batched updates for hot posts	Eventually consistent counts, but avoids DB write hot spots
Scaling path	Start with single DB; add replicas, then shard by user_id	Avoids premature complexity; clear growth roadmap

This design is built around two core principles: the feed must be fast (pre-computed and cached), and media must be cheap to store and fast to serve (object storage plus CDN). By separating the media pipeline from the metadata path, using a hybrid fan-out strategy, and caching aggressively at every layer, we build a system that can serve hundreds of millions of users while keeping the scroll experience snappy and responsive.