2.7 Message Queues & Asynchronous Processing - The System Design Interview Handbook

Asynchronism Fundamentals

Up to this point, most of the communication patterns we have discussed are synchronous.

A client sends a request, waits for the server to process it, and receives a response. That works for a lot of scenarios.

But it breaks down when the processing takes a long time, when the downstream service is temporarily unavailable, or when you need to fan out work to multiple systems at once.

Synchronous vs. Asynchronous Processing

In synchronous processing, the caller sends a request and blocks until it gets a response.

The caller cannot do anything else while it waits.

If the operation takes 5 seconds, the caller is frozen for 5 seconds. If the downstream service is down, the caller gets an error immediately and has to decide what to do about it.

In asynchronous processing, the caller sends a message and moves on without waiting for a response.

The message goes into a queue or a stream, and a separate process picks it up and handles it whenever it is ready.

The caller does not know or care when the work actually gets done.

Consider what happens when a user places an order on an e-commerce platform.

The system needs to charge the payment, reduce inventory, send a confirmation email, notify the warehouse, update the analytics dashboard, and trigger a recommendation engine refresh. In a synchronous model, the user waits while all six operations complete. If the email service is slow, the user stares at a spinner.

If the analytics service is down, the entire order fails.

In an asynchronous model, the system processes the payment and reduces inventory synchronously (because the user needs to know the order succeeded), then publishes an "order placed" event to a message queue.

The confirmation email, warehouse notification, analytics update, and recommendation refresh all happen asynchronously. The user gets their confirmation in 200 milliseconds.

The email arrives 3 seconds later. The analytics dashboard updates 10 seconds later.

If the analytics service is temporarily down, the message waits in the queue and gets processed when the service recovers.

Nothing fails. Nothing blocks.

When to Use Async: Decoupling, Buffering, Resilience

Asynchronous processing gives you three distinct advantages that synchronous communication cannot match.

Decoupling: When service A calls service B synchronously, A depends on B being available, being fast, and accepting the exact request format A sends. Change any of those, and A breaks. When service A publishes a message to a queue instead, A does not know or care which services consume that message. You can add new consumers, remove old ones, or replace them entirely without touching A. The queue is the boundary that isolates services from each other.
Buffering: Traffic does not arrive evenly. A flash sale might generate 50x your normal order volume in an hour. If your payment service processes 100 orders per second and 5,000 arrive in the same second, synchronous calls would overwhelm it. A message queue absorbs the spike. Orders pile up in the queue, and the payment service processes them at its own pace. The queue acts as a buffer between unpredictable demand and finite processing capacity.
Resilience: When a synchronous call fails, the caller must handle the failure immediately: retry, show an error, or give up. When an asynchronous message fails to process, it stays in the queue. The consumer can retry later. If the consumer crashes, another instance picks up the message. If the entire consumer service goes down for an hour, the messages accumulate in the queue and get processed when the service comes back. No data is lost. No user sees an error.

Aspect	Synchronous	Asynchronous
Caller behavior	Waits for response	Sends message and moves on
Coupling	Tight (caller depends on receiver)	Loose (queue decouples them)
Failure handling	Immediate (retry or error)	Deferred (message stays in queue)
Traffic spikes	Overwhelms downstream services	Queue absorbs the spike
Latency	Bounded by slowest operation	User-facing latency decoupled from processing time
Complexity	Simple request/response	Requires queue infrastructure, idempotent consumers

The trade-off is complexity.

Asynchronous systems are harder to debug because the path from input to output is no longer a straight line. You need monitoring on queue depths, consumer lag, and processing rates.

You need to handle duplicate messages (because at-least-once delivery means a message might arrive more than once).

And you need to accept that operations are eventually processed rather than immediately completed.

Interview-Style Question

Q: A user uploads a video to your platform. The system needs to transcode the video into multiple resolutions, generate thumbnails, run content moderation, and update the search index. Should this be synchronous or asynchronous?

A: Asynchronous, without question. Video transcoding alone takes minutes. Making the user wait for all four operations to complete would create an unusable experience. Accept the upload synchronously and return a "processing" status to the user immediately. Publish an "upload completed" event to a message queue. Separate consumers handle transcoding, thumbnail generation, content moderation, and search indexing independently and in parallel. The user sees a "your video is processing" status and gets notified when everything is ready. If the content moderation service is slow, it does not delay the transcoding or thumbnails.

Synchronous vs. Asynchronous Architecture

KEY TAKEAWAYS

Synchronous processing blocks the caller until a response arrives. Asynchronous processing lets the caller move on immediately.
Use async for decoupling services, buffering traffic spikes, and building resilience against downstream failures.
Async adds complexity: you need queue infrastructure, idempotent consumers, and monitoring for queue health.
The general rule: if the user does not need to see the result immediately, process it asynchronously.

Message Queues

A message queue is the infrastructure that makes asynchronous processing possible. It sits between the service that produces a message and the service that consumes it, holding messages safely until consumers are ready to process them.

Message Queue Architecture and Semantics

A message queue has three participants. A producer creates a message and sends it to the queue.

The queue stores the message durably.

A consumer pulls the message from the queue and processes it.

The lifecycle of a message typically follows this sequence: the producer publishes a message, the queue stores it and acknowledges receipt to the producer, a consumer fetches the message, the consumer processes it successfully, and the consumer acknowledges completion to the queue.

Once the queue receives the acknowledgment, it removes the message permanently.

If the consumer crashes before acknowledging, the queue assumes the message was not processed and makes it available to another consumer (or the same consumer after it restarts). This is how message queues provide reliability. Messages are not lost just because a consumer failed.

Most queues also support a visibility timeout (or acknowledgment deadline).

When a consumer fetches a message, that message becomes invisible to other consumers for a configurable period.

If the consumer does not acknowledge within that period, the message reappears in the queue and another consumer can pick it up. This prevents two consumers from processing the same message simultaneously, while still allowing recovery from consumer failures.

At-Least-Once, At-Most-Once, Exactly-Once Delivery

Delivery guarantees define what happens when things go wrong. They are one of the most misunderstood concepts in distributed systems.

At-most-once delivery means the system tries to deliver each message once but will not retry if delivery fails. If the consumer crashes before processing the message, that message is lost. This is the simplest and fastest approach but only works when losing occasional messages is acceptable, like non-critical logging or metrics sampling.
At-least-once delivery means the system guarantees every message is delivered, but some messages may arrive more than once. If the consumer processes a message but the acknowledgment is lost (network blip between consumer and queue), the queue assumes the message was not processed and re-delivers it. The consumer sees it twice. This is the most common guarantee in production systems because losing messages is usually worse than processing one twice.
Exactly-once delivery means every message is processed exactly one time. No losses, no duplicates. This sounds ideal, but in a distributed system it is extremely difficult (some argue impossible) to achieve in the general case. Systems that claim exactly-once semantics (like Kafka with its transactional API) achieve it within a specific scope: exactly-once processing within the Kafka ecosystem, for example, but not across external systems.

Guarantee	Messages Lost?	Duplicates?	Complexity	Use Cases
At-most-once	Possible	No	Low	Non-critical logs, metrics sampling
At-least-once	No	Possible	Medium	Most production workloads
Exactly-once	No	No	Very high	Financial transactions (within specific systems)

The practical takeaway: design your consumers to be idempotent. If processing the same message twice produces the same result as processing it once, duplicates do not matter. Use at-least-once delivery (which is reliable and widely supported) and make your consumers safe to retry. This gives you the reliability of at-least-once with the practical effect of exactly-once.

Point-to-Point vs. Publish/Subscribe Models

Message queues support two fundamental communication patterns.

Point-to-point (also called a work queue) sends each message to exactly one consumer. If you have 10 consumer instances listening on the same queue, each message is processed by only one of them. This is perfect for distributing work across a pool of workers. If you have 1,000 images to resize, 10 workers each process roughly 100 images. No image is resized twice.
Publish/subscribe (pub/sub) sends each message to every subscriber. When a producer publishes an "order placed" event, the email service, the analytics service, and the warehouse service each receive their own copy of that message. Each subscriber processes the message independently. Adding a new subscriber does not affect existing ones.

Many systems use both patterns.

An order event might go through pub/sub to fan out to multiple services.

Within each service, workers might pull from a point-to-point queue to distribute the processing load across multiple instances.

Pattern	Message Goes To	Use Case
Point-to-point	One consumer (work distribution)	Image processing, email sending, payment processing
Publish/subscribe	All subscribers (event fan-out)	Order events, user activity, system notifications

Dead Letter Queues and Poison Messages

Sometimes a message cannot be processed.

The payload is malformed.

The consumer encounters an unrecoverable error.

The message references a record that no longer exists.

Without intervention, the consumer retries the same broken message forever, wasting resources and blocking the queue.

A dead letter queue (DLQ) is a secondary queue where unprocessable messages are sent after a configured number of failed attempts. If a message fails processing three times, the queue moves it to the DLQ. The main queue keeps flowing, and an engineer can inspect the dead letters later to understand what went wrong.
A poison message is a message that crashes or hangs the consumer every time it is processed. Without a DLQ, a poison message creates a loop: the consumer picks up the message, crashes, restarts, picks up the same message, crashes again, forever. The DLQ breaks this cycle by quarantining the message after a few failed attempts.

Every production message queue should have a DLQ configured.

Monitor the DLQ size.

A sudden spike in dead letters usually signals a bug in your consumer code or a change in the message format that your consumer does not handle.

Message Ordering and FIFO Guarantees

Standard message queues do not guarantee order. If you publish messages A, B, and C, consumers might process them in the order B, A, C. For many workloads, this is fine. Resizing 1,000 images does not require any particular order.

But some workloads need ordering. If a user updates their profile three times in rapid succession, processing those updates out of order could leave the profile in a stale state. Event-driven systems often need causal ordering: a "user created" event must be processed before a "user updated" event.

FIFO (First In, First Out) queues guarantee that messages are delivered and processed in the order they were published. Amazon SQS FIFO queues and RabbitMQ's single-consumer queues provide this guarantee.

FIFO comes at a cost.

Maintaining order limits parallelism because messages in the same order group must be processed sequentially. You cannot spread a single FIFO queue across 50 consumers without breaking the ordering guarantee. The compromise is to use message groups (or partition keys): messages within the same group maintain order, but different groups are processed in parallel.

A user-events queue might use user ID as the group key, ensuring events for the same user are ordered but events for different users are processed concurrently.

Popular Message Queues: RabbitMQ, Amazon SQS, ActiveMQ

RabbitMQ is the most widely used open-source message broker. It implements the AMQP (Advanced Message Queuing Protocol) standard and supports both point-to-point and pub/sub patterns through its exchange-and-queue routing model. RabbitMQ is flexible: direct exchanges route by key, topic exchanges route by pattern matching, and fanout exchanges broadcast to all queues. It has strong delivery guarantees, publisher confirms, consumer acknowledgments, and a management UI for monitoring queue health. RabbitMQ works well for moderate throughput workloads (tens of thousands of messages per second per cluster).
Amazon SQS is a fully managed queue service. You create a queue, send messages, and receive messages. AWS handles the scaling, durability, and availability. SQS offers two flavors: standard queues (nearly unlimited throughput, at-least-once delivery, best-effort ordering) and FIFO queues (exactly-once processing within SQS, strict ordering, limited to 3,000 messages per second with batching). SQS is the simplest option if you are running on AWS and do not need the routing sophistication of RabbitMQ or the streaming capabilities of Kafka.
ActiveMQ is a mature, Java-based message broker that supports multiple protocols (AMQP, STOMP, MQTT, OpenWire). Its successor, ActiveMQ Artemis, is a high-performance rewrite that handles higher throughput and more concurrent connections. ActiveMQ's strength is protocol flexibility, making it useful in heterogeneous environments where different systems communicate over different protocols. Amazon MQ is the managed AWS version.

Queue	Type	Throughput	Ordering	Strengths	Best For
RabbitMQ	Open-source broker	Moderate (10K-50K msg/s)	Per-queue with single consumer	Flexible routing, mature, rich protocol	General-purpose messaging, complex routing
Amazon SQS	Managed service	High (standard), limited (FIFO)	FIFO queues available	Zero ops, AWS integration	AWS-native, simple queue needs
ActiveMQ	Open-source broker	Moderate	Yes (per-destination)	Multi-protocol, Java ecosystem	Enterprise, multi-protocol environments

Interview-Style Question

Q: Your payment service processes charges asynchronously via a message queue. Occasionally, the same charge is processed twice, resulting in double billing. How do you fix this?

A: The root cause is at-least-once delivery combined with a non-idempotent consumer. The fix has two parts. First, make the payment consumer idempotent by using an idempotency key (from Part II, Lesson 1). Before processing a charge, check a database or Redis for the message's unique ID. If it has already been processed, skip it and acknowledge the message. Second, ensure the idempotency check and the charge processing happen within the same transaction or atomic operation so there is no window where a message is processed but the idempotency key is not recorded. This gives you the practical effect of exactly-once processing with at-least-once delivery infrastructure.

KEY TAKEAWAYS

Message queues store messages between producers and consumers, providing reliable asynchronous communication.
At-least-once delivery is the practical standard. Design idempotent consumers to handle duplicates safely.
Point-to-point distributes work across a pool of workers. Pub/sub fans out events to multiple independent subscribers.
Dead letter queues quarantine unprocessable messages so they do not block the main queue. Always configure a DLQ in production.
FIFO queues guarantee ordering but limit parallelism. Use message groups to maintain per-entity ordering while allowing cross-entity parallelism.
RabbitMQ offers flexible routing, SQS offers zero-ops simplicity, and ActiveMQ offers multi-protocol support.

7.3 Event Streaming Platforms

Message queues solve the problem of passing individual messages between services. Event streaming platforms solve a broader problem: recording a continuous stream of events and letting multiple consumers process that stream, each at their own pace, potentially replaying past events whenever they need to.

Apache Kafka: Architecture, Partitions, Consumer Groups

Kafka is the dominant event streaming platform. Originally built at LinkedIn to handle their activity stream data, it is now used by thousands of companies to process trillions of events per day.

Kafka's core abstraction is the topic.

A topic is a named stream of events. You might have a user-signups topic, an orders topic, and a page-views topic. Producers write events to topics. Consumers read events from topics.

Each topic is divided into partitions.

A partition is an ordered, append-only log of events. When a producer writes an event, it goes to a specific partition (determined by a partition key or round robin). Within a partition, events are strictly ordered and assigned a sequential offset number.

Kafka stores events on disk and retains them for a configurable period (days, weeks, or forever), regardless of whether any consumer has read them.

This is the fundamental difference between Kafka and traditional message queues. A message queue deletes a message after a consumer processes it. Kafka keeps the event in the log. Multiple consumers can read the same events independently, and a new consumer can start from the beginning of the log and replay the entire history.

Consumer groups enable parallel consumption.

A consumer group is a set of consumer instances that collaborate to read from a topic. Kafka assigns each partition to exactly one consumer within the group.

If a topic has 12 partitions and a consumer group has 4 instances, each instance processes 3 partitions.

If you scale to 12 instances, each gets one partition.

If you add a 13th instance, it sits idle because there are no unassigned partitions.

Different consumer groups operate independently.

The orders service consumer group and the analytics consumer group each read the same orders topic at their own pace, tracking their own offsets. Adding a new consumer group does not affect existing ones.

Kafka Concept	What It Is	Why It Matters
Topic	A named stream of events	Organizes events by category
Partition	An ordered log within a topic	Enables parallelism and ordering
Offset	A sequential ID for each event in a partition	Tracks consumer position in the log
Consumer group	A set of consumers that share partition assignments	Enables parallel consumption with automatic load distribution
Retention	How long events are kept	Enables replay, reprocessing, and historical analysis

Kafka Architecture

Event Sourcing Pattern

Event sourcing is a data modeling pattern where you store every state change as an immutable event, rather than storing only the current state.

In a traditional system, if a user updates their shipping address, you overwrite the old address with the new one. The old address is gone. In an event-sourced system, you store an event: AddressChanged { userId: 42, newAddress: "123 Oak St", timestamp: "..." }. The current state is derived by replaying all events from the beginning.

The event log becomes the source of truth.

The current state is just a projection, a view computed from the events. You can rebuild the current state at any time by replaying the log. You can also build new projections later.

If you decide next year that you need a "user address change frequency" metric, you replay the historical events and compute it without changing any data.

Event sourcing pairs naturally with Kafka because Kafka already stores events in an immutable, ordered log with configurable retention. The event log in Kafka is the event store.

The trade-offs are significant. Rebuilding state from a long event history can be slow (millions of events).

Schemas evolve over time, and old events may not match the current format. Querying the current state requires materializing a projection, which adds complexity.

Event sourcing is powerful for audit-heavy systems (financial transactions, compliance) and systems where understanding the full history of changes matters, but it is overkill for simple CRUD applications.

Event Sourcing vs. Event Streaming

These terms sound similar but describe different things.

Event streaming is an infrastructure pattern. It is about moving events between systems in real time using a platform like Kafka. The events flow through topics, get consumed by different services, and drive real-time processing. The focus is on data movement and reaction.

Event sourcing is a data modeling pattern. It is about storing every state change as an immutable event and deriving current state from the event log. The focus is on how you model and persist your data.

You can use event streaming without event sourcing (stream events through Kafka without using them as your source of truth). You can use event sourcing without event streaming (store events in a database rather than Kafka). Many systems combine both: events are sourced (stored as the canonical record) and streamed (published to Kafka for other services to consume).

Stream Processing: Kafka Streams, Apache Flink, Spark Streaming

Event streaming becomes even more powerful when you process events in real time as they flow through the system. Stream processing engines let you filter, transform, aggregate, join, and analyze events continuously without storing them in a database first.

Kafka Streams is a Java library (not a separate infrastructure component) that runs inside your application. It reads from Kafka topics, processes events, and writes results to other Kafka topics. Because it is a library and not a cluster, it is simpler to deploy and operate than standalone stream processing frameworks. Kafka Streams supports stateful operations (aggregations, joins, windowed computations) using local state stores backed by Kafka.
Apache Flink is a distributed stream processing framework that runs as a standalone cluster. It handles more complex event processing than Kafka Streams: event-time processing (handling out-of-order events based on when they actually occurred rather than when they arrived), complex event patterns (detecting sequences of events across time), and exactly-once state management. Flink is the tool of choice for use cases like real-time fraud detection, anomaly detection, and complex event-driven workflows.
Spark Streaming (now Structured Streaming) processes streams in micro-batches. Instead of processing each event individually, Spark collects events into small batches (as short as 100 milliseconds) and processes each batch as a mini Spark job. This makes it easier to integrate with Spark's batch processing ecosystem but introduces a small latency floor determined by the batch interval.

Framework	Architecture	Latency	Complexity	Best For
Kafka Streams	Library (runs in your app)	Low (true streaming)	Low	Simple to moderate stream processing within Kafka
Apache Flink	Distributed cluster	Very low (true streaming)	High	Complex event processing, event-time logic
Spark Streaming	Micro-batch on Spark cluster	Moderate (100ms+ batches)	Medium	Teams already using Spark for batch processing

Interview-Style Question

Q: You need to detect fraudulent transactions in real time. A transaction is suspicious if a user makes three purchases of over $500 each within a 10-minute window from different geographic locations. How would you implement this?

A: Publish all transactions to a Kafka topic, partitioned by user ID so all transactions for the same user go to the same partition. Use Apache Flink to consume the stream and apply a windowed pattern detection: for each user, maintain a 10-minute sliding window of transactions over $500. When a window contains three or more transactions with distinct geographic locations, emit a fraud alert event to a separate Kafka topic. A downstream service consumes fraud alerts and triggers account review. Flink is the right choice here because it handles event-time windowing (critical when transactions arrive out of order due to network delays) and complex pattern matching natively. Kafka Streams could handle simpler versions of this, but the geographic cross-referencing and sliding window pattern are better suited to Flink's richer processing model.

KEY TAKEAWAYS

Kafka stores events in an immutable, partitioned log that multiple consumer groups can read independently. Events are retained, not deleted after consumption.
Partitions enable parallelism. Consumer groups distribute partitions across consumer instances. Scale consumers by adding instances up to the number of partitions.
Event sourcing stores every state change as an immutable event. Current state is derived by replaying events. Powerful for audit trails and historical analysis, but adds complexity.
Event streaming moves events between systems in real time. Event sourcing models state as a sequence of events. They are complementary but distinct patterns.
Kafka Streams is a lightweight library for stream processing within Kafka. Flink handles complex, event-time-aware processing. Spark Streaming bridges batch and stream processing.

7.4 Task Queues & Job Scheduling

Not all background work is event-driven.

Sometimes you just need to run a specific task later: send a batch of emails at midnight, generate a weekly report, resize uploaded images, or retry a failed webhook delivery.

Task queues and job schedulers handle this category of work.

Task Queues for Background Processing

A task queue lets your application offload work to a pool of background workers.

The application creates a task (a description of work to be done), places it on the queue, and returns immediately.

A worker process picks up the task, executes it, and reports the result.

The difference between a task queue and a message queue is emphasis.

A message queue focuses on passing data between services.

A task queue focuses on executing functions asynchronously.

In practice, task queues are often built on top of message queues or key-value stores, but they add higher-level features like task retries, result storage, progress tracking, and scheduling.

Common use cases include sending emails and notifications (the user does not need to wait for the email to be delivered), processing uploads (resizing images, transcoding video, scanning for malware), generating reports (aggregating data that takes minutes to compute), and calling external APIs with retry logic (webhook delivery, third-party integrations).

Celery, Sidekiq, Bull: Worker Queue Systems

Celery is the dominant task queue for Python applications. It supports multiple message broker backends (RabbitMQ and Redis are the most common), handles task retries with configurable backoff strategies, provides result storage, and includes monitoring tools (Flower). Celery workers can run on multiple machines, distributing tasks across a cluster. Its configuration can be complex for advanced use cases (task routing, priority queues, rate limiting), but for standard background processing it is the established Python solution.
Sidekiq is the standard for Ruby applications, particularly Ruby on Rails. It uses Redis as its backend and processes tasks using threads, which makes it memory-efficient compared to process-based workers. Sidekiq Pro and Enterprise add features like reliable fetch (preventing message loss during worker crashes), rate limiting, and batch processing. Sidekiq's web dashboard provides real-time visibility into queues, retries, and failed jobs.
Bull (and its successor BullMQ) is the leading task queue for Node.js applications. It uses Redis as its backend and supports delayed jobs (execute this task 30 minutes from now), repeatable jobs (execute this task every hour), rate limiting, job prioritization, and concurrency control. BullMQ adds support for worker threads and improved reliability.

System	Language	Backend	Strengths	Best For
Celery	Python	RabbitMQ or Redis	Mature ecosystem, flexible routing	Python applications
Sidekiq	Ruby	Redis	Thread-based efficiency, Rails integration	Ruby/Rails applications
Bull / BullMQ	Node.js	Redis	Delayed jobs, repeatable jobs, modern API	Node.js applications

Cron Jobs and Scheduled Tasks at Scale

A cron job is a task that runs on a fixed schedule: every minute, every hour, at midnight on Sundays. On a single server, the operating system's cron daemon handles this. But in a distributed system with multiple servers, running cron jobs gets complicated.

If you have 10 application servers and each one runs the same cron job at midnight, you get 10 copies of the same task executing simultaneously.

A daily billing report gets generated 10 times.

A cleanup job deletes the same records 10 times (or, worse, fails because the first instance already deleted them).

Solutions include leader election (only one server in the cluster runs the cron job, using a distributed lock in Redis or ZooKeeper to coordinate), dedicated scheduler services (a single scheduler process triggers tasks on the appropriate queue, and workers execute them), and managed scheduling services (AWS EventBridge, Google Cloud Scheduler, or Kubernetes CronJobs handle the scheduling infrastructure so you do not have to).

The pattern that works best at scale: use a scheduler that fires an event or enqueues a task at the scheduled time, and let your existing task queue workers execute the actual work.

The scheduler is responsible only for timing.

The task queue handles execution, retries, and monitoring. This separates concerns cleanly and lets you scale each piece independently.

Back Pressure: Preventing Queue Overload

What happens when producers add messages to a queue faster than consumers can process them?

The queue grows. Memory fills up.

Eventually, the queue runs out of storage and starts rejecting new messages, or worse, crashes.

Back pressure is the set of mechanisms that prevent this scenario by slowing down producers when the system is overwhelmed.

Queue-level back pressure involves setting a maximum queue size. When the queue reaches capacity, new messages are rejected, and the producer receives an error. The producer can then retry after a delay, drop the message if it is non-critical, or redirect to an overflow queue.
Consumer-level back pressure involves consumers pulling messages at their own pace rather than having messages pushed to them. Kafka uses this model by default: consumers fetch messages in batches and control how quickly they advance through the log. If a consumer falls behind, it simply has a larger lag, but it does not crash or lose messages (as long as retention covers the lag).
Producer-level back pressure involves monitoring queue depth and throttling the producer when the queue grows beyond a threshold. If the queue depth exceeds 10,000 messages, the producer slows its sending rate. This keeps the queue from growing unboundedly but requires coordination between producers and queue monitoring.
Rate limiting at the queue or task level caps how many messages or tasks are processed per time period. Sidekiq Enterprise and BullMQ both support rate-limited queues where you can say "process at most 100 tasks per second" regardless of how many are waiting.

The right back pressure strategy depends on your tolerance for message loss and latency.

For critical data (payments, orders), you never drop messages; you accept higher latency. For non-critical data (analytics events, metrics), you might drop or sample messages during extreme spikes to protect system stability.

Priority Queues and Fair Scheduling

Not all tasks are equally urgent.

A password reset email should be processed before a weekly newsletter digest. A real-time payment notification matters more than a monthly usage report.

Priority queues assign a priority level to each task. Higher-priority tasks are processed before lower-priority ones, regardless of when they entered the queue.

RabbitMQ supports priority queues natively (up to 255 priority levels). Celery, Sidekiq, and Bull all support prioritization through separate queues with different polling weights or through built-in priority mechanisms.

A common pattern is to use multiple named queues with different priorities.

A critical queue gets polled constantly.

A default queue gets polled when critical is empty.

A low queue gets polled only when both critical and default are empty. Workers check the highest-priority queue first.

The risk with strict priority is starvation.

If the critical queue always has work, the low-priority queue never gets processed. Fair scheduling prevents this by allocating a percentage of worker capacity to each priority level.

For example, 60% of workers serve the critical queue, 30% serve default, and 10% serve low. This ensures every priority level makes progress even during high-traffic periods.

Approach	How It Works	Risk
Strict priority	Always process higher-priority tasks first	Low-priority tasks starve under sustained load
Weighted polling	Poll higher-priority queues more frequently	Tuning the weights requires monitoring
Dedicated workers	Assign specific workers to each priority	Under-utilization if a priority level has no work
Fair scheduling	Allocate a percentage of capacity per priority	All priorities make progress, but critical tasks may wait slightly longer

Beginner Mistake to Avoid

New engineers sometimes throw everything into a single queue and wonder why their time-sensitive tasks (like password reset emails) wait behind thousands of low-priority tasks (like analytics event processing).

Before you deploy a task queue to production, define at least two priority levels and separate your time-sensitive work from your bulk processing work. It takes five minutes to configure and saves you from an escalation at 2 AM when users complain about not receiving their password reset emails.

Interview-Style Question

Q: Your application sends transactional emails (password resets, order confirmations) and marketing emails (newsletters, promotions) through the same task queue. Users complain that password reset emails take 30 minutes to arrive during marketing campaigns. How do you fix this?

A: Separate the two workloads into different queues with different priorities. Create a transactional queue for time-sensitive emails (password resets, order confirmations, security alerts) and a marketing queue for bulk sends (newsletters, promotions). Assign dedicated workers to the transactional queue that are never borrowed for marketing work, ensuring transactional emails are processed within seconds regardless of marketing volume. Marketing workers can scale up during campaigns and scale down afterward. Additionally, add a rate limit to the marketing queue to prevent it from consuming all available email-sending capacity (which might trigger rate limits from your email provider and delay transactional emails that share the same sender).

Task Scheduling

KEY TAKEAWAYS

Task queues let your application offload work to background workers. Use them for emails, uploads, report generation, and any processing the user does not need to wait for.
Celery (Python), Sidekiq (Ruby), and Bull/BullMQ (Node.js) are the standard task queue libraries for their respective ecosystems.
Cron jobs in distributed systems need coordination to prevent duplicate execution. Use leader election, dedicated schedulers, or managed scheduling services.
Back pressure prevents queue overload when producers outpace consumers. Use queue size limits, consumer-paced pulling, and producer throttling.
Separate time-sensitive tasks from bulk processing using priority queues. Never let a marketing email campaign delay a password reset.
Monitor queue depth, consumer lag, and dead letter queue size. A growing queue depth is an early warning that your consumers need scaling.

Up Next: You now have every core building block of system design: networking, storage, caching, load balancing, CDNs, proxies, and message queues. Chapter III shifts from individual components to system-wide properties. Lesson 1 covers Scalability: how to take a system from zero to millions of users, identify bottlenecks, and design architectures that grow gracefully. You will tie together everything you have learned so far and see how these building blocks work in concert to handle real-world scale.