2.5 Content Delivery Networks (CDN) - The System Design Interview Handbook

What Is a CDN and How It Works

A Content Delivery Network is a geographically distributed group of servers that stores copies of your content closer to your users.

Instead of every request traveling from Tokyo to your origin server in Virginia, the CDN serves a cached copy from a server in Tokyo.

The user gets their response faster, and your origin server handles less traffic.

Think about what happens without a CDN. A user in Mumbai loads your website. Their browser sends a request that crosses undersea cables, traverses multiple network hops, reaches your server in the US, gets processed, and the response makes the return trip.

That round trip adds 200 to 300 milliseconds of network latency before the server even starts doing any work. For a page with 50 assets (images, scripts, stylesheets), those round trips stack up fast.

With a CDN, that same user in Mumbai hits a CDN edge server located in Mumbai or nearby.

The edge server already has a cached copy of most of those assets. The response arrives in 10 to 20 milliseconds.

The origin server in the US never even knows the request happened.

CDNs operate through a network of Points of Presence (PoPs), which are data centers spread across major cities worldwide.

Large CDN providers maintain hundreds of PoPs.

When a user makes a request, DNS resolution directs them to the nearest PoP, and the edge server at that PoP handles the response.

The first time an edge server receives a request for content it does not have, it fetches that content from your origin server, caches it, and returns it to the user.

Subsequent requests for the same content from users in that region are served directly from the edge cache. This is why the first request to a cold edge is slightly slower, but every request after it is dramatically faster.

Content Delivery Network

Push CDNs vs. Pull CDNs

CDNs get content to their edge servers in one of two ways, and the approach you choose affects how much control you have and how much work you do.

Pull CDNs

A pull CDN fetches content from your origin server on demand.

When the first user in a region requests a file, the edge server does not have it yet, so it pulls the file from your origin, caches it, and serves it.

Every subsequent request for that file in that region is served from cache until the TTL expires.

Pull CDNs require minimal setup. You point the CDN at your origin server, and it handles the rest. You do not need to upload files to the CDN or manage what is cached where.

The CDN figures it out based on actual user traffic.

The downside is that first request for each file in each region is slow because it triggers an origin fetch.

For content that is rarely accessed in certain regions, the cache may never warm up, and users in those regions consistently experience slower responses.

Pull CDNs also put unpredictable load on your origin server because cache misses trigger origin fetches.

Most CDN providers operate as pull CDNs by default. Cloudflare, CloudFront, and Fastly all use the pull model as their primary approach.

Push CDNs

A push CDN requires you to upload content directly to the CDN's storage. You are responsible for deciding what content goes to which edge locations and when.

The CDN does not fetch from your origin on a miss because there is no origin in the traditional sense. The CDN's storage is the source.

Push CDNs give you complete control over what is cached and when it is updated. You upload a new version of a file, and it is available immediately without waiting for cache expiration.

This is ideal for large files that rarely change, like video libraries, software downloads, or firmware updates.

The trade-off is operational overhead.

You manage the upload process, track which versions are deployed, and handle the storage costs on the CDN side.

If your content changes frequently, keeping the CDN in sync with your origin becomes a constant maintenance task.

Aspect	Pull CDN	Push CDN
Content delivery	Fetched from origin on first request	Uploaded by you to CDN storage
Setup effort	Minimal (point CDN at your origin)	Higher (manage uploads and versions)
First request latency	Slower (origin fetch on cache miss)	Fast (content already on CDN)
Best for	Websites, APIs, frequently changing content	Large static files, video libraries, software downloads
Origin server load	Unpredictable (spikes on cache misses)	Minimal (CDN serves from its own storage)

Many production systems use a hybrid approach.

Static assets like JavaScript bundles and CSS files are served through a pull CDN because they change with every deployment and the CDN handles cache invalidation automatically.

Large media files like videos are pushed to CDN storage because they are accessed heavily and should never require an origin fetch.

CDN Caching Strategies and TTL Management

The effectiveness of your CDN depends almost entirely on how well you configure caching.

Cache too aggressively and users see stale content.

Cache too conservatively and your origin server handles traffic that the CDN should be absorbing.

How CDN Caching Works

CDN edge servers cache content based on HTTP cache headers that your origin server sends with each response. The two most relevant headers are Cache-Control and Expires.

Cache-Control: max-age=86400 tells the edge server to cache the response for 86,400 seconds (24 hours). During that window, the edge serves the cached version without contacting your origin.
Cache-Control: no-cache tells the edge server to revalidate with the origin before serving the cached version. The edge sends a conditional request (using If-Modified-Since or If-None-Match), and the origin either confirms the cached version is still valid (304 Not Modified) or sends the new version. This adds a round trip but ensures freshness.
Cache-Control: no-store tells the edge not to cache the response at all. Every request goes to the origin. Use this for highly personalized or sensitive content that should never be stored on third-party servers.

TTL Management Strategies

Choosing the right TTL for each type of content is a balancing act between performance and freshness.

Long TTLs (hours to days) work for content that rarely changes: company logos, font files, third-party library scripts. Set max-age=2592000 (30 days) and forget about it. If the content does change, use cache-busting (covered in the invalidation section below) to force an update.
Medium TTLs (minutes to hours) work for content that changes periodically: product listings, blog posts, search results pages. A TTL of 300 seconds (5 minutes) means users see a version that is at most 5 minutes stale, which is acceptable for most content sites.
Short TTLs (seconds) work for content that changes frequently but still benefits from even brief caching. An API endpoint returning trending topics might use a TTL of 10 seconds. That 10-second cache still absorbs thousands of redundant requests during a traffic spike.
Zero TTL with revalidation works when you need guaranteed freshness but still want to avoid transferring unchanged data. The edge checks with the origin every time, but if the content has not changed, the origin responds with a tiny 304 response instead of re-sending the full payload.

Content Type	Recommended TTL	Cache-Control Example
Versioned static assets (app.v3.js)	1 year	`max-age=31536000, immutable`
Unversioned static assets (logo.png)	1 to 30 days	`max-age=2592000`
Semi-dynamic pages (product listing)	5 to 60 minutes	`max-age=300`
Frequently changing API responses	5 to 60 seconds	`max-age=10`
Personalized or sensitive data	No cache	`no-store`

A powerful pattern for static assets is content hashing. Instead of serving app.js and hoping the cache updates when you deploy, serve app.a3f8b2c1.js where the hash changes whenever the file content changes.

Set the TTL to one year.

When you deploy new code, the HTML references a new filename with a new hash, and the CDN fetches the new file on the first request.

The old cached file naturally ages out. This gives you the best of both worlds: aggressive caching with instant updates on deploy.

Edge Computing and Edge Functions

Traditional CDNs only serve cached content. If the request needs computation, it gets forwarded to your origin server.

Edge computing changes that by letting you run code directly on CDN edge servers, bringing computation closer to the user alongside the cached data.

What Edge Functions Do

Edge functions are small pieces of code that execute at the CDN edge, typically in response to an incoming request. They run in lightweight runtimes (often V8 isolates or WebAssembly sandboxes) that start in microseconds rather than the milliseconds or seconds that traditional server cold starts require.

Common use cases include modifying request or response headers (adding security headers, rewriting URLs), performing authentication and authorization checks before a request reaches your origin, personalizing cached content (serving a different language version based on the user's location), A/B testing (routing a percentage of traffic to a variant without origin involvement), and redirecting users based on geography or device type.

When Edge Functions Make Sense

Edge functions shine when the computation is lightweight and the result benefits from proximity to the user.

Checking whether a JWT token is valid takes microseconds and can reject unauthorized requests at the edge without wasting an origin round trip. Rewriting URLs for localization is trivial compute that benefits from edge execution.

Edge functions are not suited for heavy computation, database queries, or operations that require access to state stored on your origin.

If an edge function needs to call your database, the latency of that call negates the benefit of running at the edge in the first place.

Cloudflare Workers, AWS Lambda@Edge and CloudFront Functions, Fastly Compute, and Vercel Edge Functions are the major edge computing platforms. They differ in runtime support, execution time limits, and pricing, but the core concept is the same: run small, fast functions at the CDN edge.

CDN Invalidation and Purging

Cached content eventually becomes stale.

When you update a product price, fix a bug in your JavaScript, or publish a new blog post, you need the CDN to stop serving the old version. This is the invalidation problem.

Why Invalidation Is Hard

CDN edge servers are distributed across hundreds of locations.

When you invalidate content, that invalidation command must propagate to every edge server worldwide. Some providers complete this in seconds.

Others take minutes. During the propagation window, some users get the new version while others still see the old one.

Invalidation Strategies

TTL expiration is the passive approach. You set a TTL, and the old content naturally ages out. No manual action needed, but you wait for the TTL to expire. If your TTL is 24 hours, users could see stale content for up to 24 hours after an update.
Purge by URL lets you explicitly remove a specific cached object from all edge servers. Most CDN providers offer an API for this: send a purge request for https://example.com/images/product-42.jpg, and every edge server deletes its cached copy. The next request triggers a fresh fetch from the origin.
Purge by tag or surrogate key is more powerful. You tag cached objects with custom labels when they are cached. A product page might be tagged with product-42 and category-electronics. When product 42's price changes, you purge everything tagged product-42, which invalidates the product page, the category listing, and any API responses that include that product's data. Fastly's surrogate keys and CloudFront's cache tags support this pattern.
Cache busting through versioned filenames avoids the invalidation problem entirely for static assets. Instead of purging app.js, you deploy app.v2.js (or app.abc123.js using a content hash). The old file remains cached but no new HTML references it. The new file gets fetched and cached fresh. This is the most reliable strategy for JavaScript, CSS, and images that change with deployments.

Strategy	Speed	Scope	Best For
TTL expiration	Passive (wait for TTL)	Automatic	Content that can tolerate brief staleness
Purge by URL	Seconds to minutes	Single object	Urgent updates to specific files
Purge by tag	Seconds to minutes	All objects with the tag	Related content updates (product changes)
Cache busting (versioned filenames)	Instant (new URL = no old cache)	Per asset	Static assets tied to deployments

Beginner Mistake to Avoid

New engineers sometimes set long TTLs on unversioned URLs and then panic when they need to push an urgent fix.

The CDN serves the old version, the purge takes minutes to propagate, and users see broken content.

Avoid this by using content-hashed filenames for all static assets.

Reserve URL-based purging for the rare cases where you cannot control the filename (like an API endpoint or an HTML page).

Multi-CDN Strategies

A single CDN provider handles most production workloads perfectly well. But for very large-scale systems, mission-critical applications, or global platforms with stringent latency requirements, using multiple CDN providers simultaneously offers advantages that a single provider cannot match.

Why Use Multiple CDNs

Reliability: CDN providers have outages. Cloudflare went down globally in 2022. AWS CloudFront has had regional disruptions. If your entire content delivery depends on one provider and they go down, your users see failures. With multiple CDNs, you fail over to a healthy provider and maintain availability.
Performance optimization: Different CDN providers have stronger networks in different regions. Provider A might have excellent coverage in Asia but mediocre coverage in South America. Provider B might be the opposite. By routing traffic to the provider with the best performance for each user's region, you get better global latency than any single provider could offer.
Cost negotiation: When you have the ability to shift traffic between providers, you have leverage in pricing negotiations. Providers compete for your traffic, and you can optimize costs by routing different traffic types to whichever provider offers the best rates for that category.

How Multi-CDN Works

A multi-CDN setup typically uses DNS-based traffic management.

A service like NS1, Route 53, or Cedexis monitors the real-time performance of each CDN provider from various global locations.

When a user resolves your domain, the DNS layer returns the IP address of the CDN provider that is currently performing best for that user's region.

Some implementations are simpler: primary/failover, where all traffic goes to CDN A unless health checks detect a failure, at which point DNS switches to CDN B. Others are more dynamic, continuously routing traffic to the best-performing provider for each individual request.

The trade-off is complexity. You maintain configurations, SSL certificates, and origin settings across multiple CDN providers.

Cache invalidation must happen on all providers simultaneously. Monitoring becomes more complex because you track performance across multiple systems.

Multi-CDN makes sense for large-scale platforms where the reliability and performance gains justify the operational overhead.

For most applications, a single well-configured CDN is sufficient.

CDN for Static vs. Dynamic Content

CDNs were originally built to serve static content: files that are the same for every user and change infrequently. But modern CDNs can handle dynamic content too, with different strategies for each.

Static Content

Static content is the CDN's sweet spot. Images, videos, fonts, JavaScript bundles, CSS files, PDFs, and software downloads are identical for every user. They change only when you explicitly deploy a new version.

CDN caching works perfectly here because the same cached copy serves millions of users.

For static content, set aggressive TTLs (days to months), use content-hashed filenames for cache busting, and let the CDN absorb as close to 100% of the traffic as possible. Your origin server should rarely be touched for static assets in a well-configured setup.

Dynamic Content

Dynamic content varies by user, request parameters, or time.

An API response that returns a user's personalized feed, a search results page, or a shopping cart summary is different for every request.

CDNs can still help with dynamic content, but the strategies differ. First, you can cache dynamic content with very short TTLs. Even caching an API response for 5 seconds absorbs enormous amounts of traffic during a spike.

If 10,000 users hit the same endpoint within 5 seconds, 9,999 of those requests are served from the CDN edge cache.

Second, edge functions can personalize cached content at the edge.

You cache a generic version of a page and use an edge function to inject user-specific data (like the user's name or notification count) before serving it.

The heavy lifting (the page structure, layout, and most of the content) comes from cache, and only the personalized bits require computation.

Third, CDNs optimize the network path for dynamic content even when they cannot cache it.

Requests from a user in Sydney to your origin in Virginia benefit from the CDN's optimized backbone network, which is often faster and more reliable than the public internet.

CloudFront's Origin Shield and similar features route all cache misses through a single intermediate cache layer, reducing the number of origin fetches even further.

Content Type	CDN Strategy	TTL	Origin Load
Versioned static assets	Aggressive caching, content-hashed filenames	Months to years	Nearly zero
Unversioned static assets	Standard caching	Hours to days	Low
Semi-dynamic content	Short TTL caching	Seconds to minutes	Moderate
Fully personalized content	Edge functions or pass-through with network optimization	No cache	High (but network-optimized)

Popular CDN Providers

Four providers dominate the CDN market, each with distinct strengths that make them better suited for different use cases.

CloudFront

CloudFront is Amazon's CDN, deeply integrated with the AWS ecosystem.

If your infrastructure runs on AWS, CloudFront connects to S3, EC2, ALB, and Lambda@Edge with minimal configuration and no data transfer charges between AWS services and CloudFront.

CloudFront offers Origin Shield, a centralized caching layer between edge servers and your origin that reduces origin load by consolidating cache misses. It supports both Lambda@Edge (full Lambda functions at the edge with access to other AWS services) and CloudFront Functions (lightweight, ultra-fast functions for simple transformations).

The trade-off is that CloudFront's configuration is more complex than competitors, and its performance outside of AWS's own network can lag behind dedicated CDN providers in certain regions.

Pricing is usage-based and can be harder to predict.

Cloudflare

Cloudflare operates one of the largest CDN networks in the world with over 300 PoPs. It combines CDN functionality with a comprehensive security suite: DDoS protection, WAF (Web Application Firewall), bot management, and DNS.

Cloudflare Workers is its edge computing platform, running on every PoP with a generous free tier.

Cloudflare's R2 object storage is S3-compatible with zero egress fees, which is a significant cost advantage for bandwidth-heavy applications.

Cloudflare's strength is the combination of performance, security, and simplicity. Its free tier is genuinely useful for small projects.

The trade-off is less flexibility for complex origin configurations compared to CloudFront, and its analytics are less granular than some competitors.

Akamai

Akamai is the oldest and largest CDN provider, with over 4,100 PoPs in more than 130 countries. It carries a significant portion of global internet traffic and has the deepest edge network presence, including locations that other providers do not reach.

Akamai's strengths are in enterprise-grade media delivery, large-scale software distribution, and markets like gaming and broadcast media where edge coverage and reliability are paramount. Its security portfolio includes Prolexic (DDoS mitigation), Kona (WAF), and Bot Manager.

The trade-off is cost and complexity.

Akamai is the most expensive option and its configuration requires more expertise than newer competitors. Its developer experience is less modern than Cloudflare's or Fastly's.

Akamai is the choice for large enterprises with global reach requirements and the budget to match.

Fastly

Fastly differentiates itself through real-time control and programmability. Its edge platform (Fastly Compute, built on WebAssembly) gives developers more power at the edge than any competitor.

Cache invalidation on Fastly is near-instant (typically under 150 milliseconds globally) through its surrogate key system, which is significantly faster than other providers.

Fastly's VCL (Varnish Configuration Language) gives fine-grained control over caching behavior, routing, and request handling. Its real-time logging and analytics stream lets you monitor CDN performance with near-zero delay.

Fastly is favored by engineering teams that need maximum control over their CDN behavior and fast, reliable cache invalidation.

The trade-off is a smaller network footprint than Akamai or Cloudflare, and its configuration language has a steeper learning curve.

Provider	PoPs	Edge Computing	Invalidation Speed	Strengths	Best For
CloudFront	600+	Lambda@Edge, CloudFront Functions	Minutes	AWS integration, Origin Shield	AWS-native architectures
Cloudflare	300+	Workers, Pages	Seconds	Security, simplicity, free tier	Most web applications, security-first
Akamai	4,100+	EdgeWorkers	Minutes	Global reach, enterprise reliability	Large enterprises, media delivery
Fastly	90+	Fastly Compute (Wasm)	Sub-second (< 150ms)	Programmability, instant purge	Engineering teams needing fine control

Interview-Style Question

Q: You are building a global e-commerce platform. Product images rarely change, but product prices update multiple times per day. How would you use a CDN to serve both?

A: Serve product images through the CDN with long TTLs (30 days) using content-hashed filenames. When an image is updated, the new filename with a new hash gets cached fresh, and the old version ages out naturally. No active invalidation needed. For product prices, use a short TTL (30 to 60 seconds) on the API responses that include pricing data. This means prices can be at most 60 seconds stale, which is acceptable for a browsing experience. For the actual checkout flow, bypass the CDN and hit the origin directly to guarantee the user pays the current price. Use surrogate key tagging (on Fastly) or cache tags (on CloudFront) so that when a specific product's price changes, you can selectively purge all cached responses that include that product without invalidating unrelated content.

KEY TAKEAWAYS

A CDN caches content at edge servers worldwide, reducing latency by serving responses from locations geographically close to users.
Pull CDNs fetch content from your origin on demand. Push CDNs require you to upload content to CDN storage. Most systems use pull CDNs with push for large media files.
Set TTLs based on how stale each content type can be. Use content-hashed filenames for static assets to enable aggressive caching with instant updates on deploy.
Edge functions let you run lightweight code at the CDN edge for personalization, authentication, and request manipulation without round trips to the origin.
Invalidate by TTL expiration for routine updates, by URL or surrogate key for urgent updates, and by cache busting for static assets. Always have an invalidation plan before setting a long TTL.
Multi-CDN strategies improve reliability and global performance but add operational complexity. Most applications do well with a single provider.
CDNs handle static content natively. For dynamic content, use short TTLs, edge functions, and network path optimization.
Choose your CDN based on your ecosystem (CloudFront for AWS), security needs (Cloudflare), global enterprise reach (Akamai), or programmability and instant purge (Fastly).

Up Next: CDNs sit between your users and your servers. But what about the layer that sits directly in front of your backend services, handling security, compression, and traffic management? That is the reverse proxy. Chapter II, Lesson 6 covers the difference between forward and reverse proxies, when you need a reverse proxy versus a load balancer, and how the sidecar proxy pattern powers modern service meshes.

Check out Grokking the System Design Interview course for complete prep.