6.2 CI/CD & Deployment - The System Design Interview Handbook

Continuous Integration and Continuous Delivery/Deployment

CI/CD is the automation backbone that connects code changes to running production systems.

Without it, deployments are manual, error-prone, infrequent, and terrifying.

With it, teams ship changes multiple times per day with confidence.

Continuous Integration (CI)

Continuous Integration means every developer merges their code changes into a shared branch frequently (at least daily, often multiple times per day).

Each merge triggers an automated pipeline that builds the code, runs tests, and reports the results.

If the build or tests fail, the team knows immediately and fixes the problem before it compounds.

CI catches integration problems early. If Developer A changes the user authentication module and Developer B changes the session management module, their changes might conflict.

Without CI, both developers work for a week, merge their branches, and spend two days resolving conflicts.

With CI, the conflict surfaces on the first day, when it is cheap to fix.

The core CI practices are keeping the main branch always in a deployable state, writing automated tests that run on every commit (unit tests at minimum, integration tests ideally), and fixing broken builds immediately (a broken build blocks the entire team).

Continuous Delivery vs. Continuous Deployment

These two terms sound identical but have a critical difference.

Continuous Delivery means every code change that passes the automated pipeline is ready to be deployed to production at any time.

The deployment itself is a manual decision: a product manager clicks a button, an engineer approves the release.

The key is that the code is always in a deployable state. Deploying is a business decision, not a technical challenge.

Continuous Deployment removes the manual step entirely. Every change that passes all automated tests is deployed to production automatically.

No human approval.

No release train.

A developer merges code, the pipeline runs, tests pass, and the change is live in production within minutes.

Practice	Automated Build?	Automated Tests?	Automated Deploy to Production?
Continuous Integration	Yes	Yes	No
Continuous Delivery	Yes	Yes	Manual trigger (ready anytime)
Continuous Deployment	Yes	Yes	Yes (fully automatic)

Most teams practice Continuous Delivery.

Full Continuous Deployment requires a very mature test suite and robust monitoring because there is no human checkpoint before code reaches users.

Companies like GitHub, Netflix, and Etsy practice Continuous Deployment because they have invested heavily in test automation, feature flags, and rapid rollback capabilities.

Build Pipelines and Artifact Management

A build pipeline is the automated sequence of steps that code passes through from commit to deployment. Each step must succeed before the next one runs.

If any step fails, the pipeline stops and the team is notified.

Typical Pipeline Stages

Source stage: Triggered by a code commit or pull request merge. The pipeline checks out the latest code from the repository.

Build stage: Compiles the code (for compiled languages), resolves dependencies, and produces a build artifact. For a Java application, this might be a JAR file. For a containerized application, this is a Docker image. The build stage should be deterministic: the same source code always produces the same artifact.

Test stage: Runs automated tests against the build artifact. This typically includes unit tests (fast, isolated, run in seconds), integration tests (verify interactions between components, take minutes), and optionally end-to-end tests (simulate real user workflows, take longer). The test stage gates the pipeline. If tests fail, the artifact is not deployed.

Security and quality stage: Static analysis tools scan for code quality issues (SonarQube, ESLint). Security scanners check for known vulnerabilities in dependencies (Snyk, Dependabot). Container image scanners check for vulnerabilities in base images (Trivy, Aqua). This stage catches problems that tests alone might miss.

Artifact storage: The verified build artifact is pushed to an artifact repository. Docker images go to a container registry (ECR, Docker Hub, GitHub Container Registry). JAR files go to a Maven or Gradle repository. npm packages go to npm registry. The artifact is tagged with a version identifier (typically the git commit hash or a semantic version).

Deployment stage: The artifact is deployed to the target environment. In Continuous Delivery, this stage deploys to staging automatically and waits for manual approval before production. In Continuous Deployment, it deploys to production automatically.

Artifact Management

The build artifact is the single source of truth for what is deployed.

The same artifact that passes tests in CI is the one that runs in staging and production. You never rebuild the code for a different environment.

You build once, test once, and promote the same artifact through environments.

Tagging artifacts with the git commit hash creates a direct link between running code and its source. If production is running image app:a3f8b2c1, you can inspect commit a3f8b2c1 in git to see exactly what code is deployed.

Retain old artifacts for a reasonable period (30 to 90 days) so you can roll back to a previous version without rebuilding.

Artifact retention policies automatically clean up old versions to manage storage costs.

Popular CI/CD Platforms

Platform	Type	Strengths
GitHub Actions	Cloud-native, integrated with GitHub	Seamless GitHub integration, YAML workflows
GitLab CI/CD	Integrated with GitLab	All-in-one platform, built-in registry
Jenkins	Self-hosted, open source	Maximum flexibility, massive plugin ecosystem
CircleCI	Cloud-hosted	Fast builds, good parallelism
AWS CodePipeline	AWS-native	Deep AWS integration
ArgoCD	Kubernetes-native (GitOps)	Declarative deployments, Kubernetes-first

Deployment Strategies: Rolling, Blue-Green, Canary, Shadow

How you deliver new code to production determines how much risk you take, how quickly you detect problems, and how easily you recover from bad deployments. Each strategy trades complexity for safety.

Rolling Deployment

A rolling deployment replaces instances of the old version with instances of the new version one at a time (or in small batches).

If you have 10 servers running version 1, a rolling deployment upgrades server 1 to version 2, waits for it to pass health checks, then upgrades server 2, and so on until all 10 run version 2.

During the rollout, both versions are running simultaneously. Some users hit version 1 servers, others hit version 2.

This is usually fine for stateless services but can cause problems if the two versions have incompatible API contracts or database schemas.

Rolling deployments are the default strategy in Kubernetes (handled by Deployment resources). They require no extra infrastructure (no duplicate environment).

The rollout can be paused or reversed if health checks fail.

The risk is that problems might not surface until several servers have been upgraded.

If the issue only manifests under a specific traffic pattern, it might not appear until 50% of servers are running the new version, at which point half your fleet is affected.

Blue-Green Deployment

Blue-green deployment maintains two identical production environments: blue (currently serving traffic) and green (idle or running the previous version).

You deploy the new version to the green environment, run smoke tests against it, and then switch the load balancer to route all traffic from blue to green in a single cut.

If the new version has problems, you switch the load balancer back to blue, and users are on the old version within seconds.

The rollback is instant because the old environment is still running.

The cost is infrastructure. You maintain two complete production environments.

During normal operation, one is idle (or handling only test traffic), doubling your compute costs. Some teams mitigate this by using the idle environment for staging or pre-production testing.

Blue-green works best for applications where zero-downtime deployment is critical and where the cost of maintaining a duplicate environment is justified by the safety of instant rollback.

Canary Deployment

A canary deployment routes a small percentage of production traffic to the new version while the majority continues hitting the old version.

You deploy the new version to a small number of servers (the canary), route 5% of traffic to them, and monitor error rates, latency, and business metrics.

If the canary performs well after a specified observation period (15 minutes to a few hours), you gradually increase the percentage: 5% to 25% to 50% to 100%.

If the canary shows elevated errors or degraded performance, you route its traffic back to the old version immediately. Only 5% of users were ever exposed to the problem.

Canary deployments are the safest strategy for catching production-only issues.

Some bugs only appear under real production traffic patterns, and a canary catches them before they affect all users.

The cost is the infrastructure for monitoring and traffic splitting, and the additional time needed for the gradual rollout.

Shadow Deployment

A shadow deployment sends a copy of production traffic to the new version without serving the new version's responses to users.

Real users always receive responses from the current version.

The new version processes the same requests in parallel, and its responses are logged and analyzed but never returned to users.

Shadow deployment is the safest possible strategy because users are never affected by the new version, regardless of how it behaves. It is ideal for testing major changes like database migrations, new recommendation algorithms, or refactored core logic where you want to compare the new version's output against the old version's output under real traffic conditions.

The cost is running duplicate processing for all shadowed traffic (doubling compute costs during the test), complexity in routing duplicate traffic, and the fact that shadow deployments only work for read operations.

You cannot shadow a write operation (like creating an order) because the shadow would create duplicate orders.

Strategy	Risk Level	Rollback Speed	Extra Infrastructure	Best For
Rolling	Medium	Moderate (roll forward/back)	None	Standard deployments, stateless services
Blue-green	Low	Instant (switch load balancer)	Full duplicate environment	Zero-downtime requirements, instant rollback
Canary	Lowest	Fast (reroute canary traffic)	Small canary fleet	Catching production-only bugs gradually
Shadow	None (users unaffected)	N/A (users never see new version)	Full duplicate processing	Testing major changes under real traffic

Feature Flags and Progressive Delivery

Feature flags (covered in Part IV, Lesson 6) enable progressive delivery, a deployment philosophy that separates releasing code from releasing features.

With progressive delivery, new code is deployed to production but remains invisible to users.

The feature is behind a flag that is turned off. You turn it on for internal employees first.

Then for beta testers.

Then for 1% of users.

Then 10%.

Then 100%.

At each stage, you monitor metrics and decide whether to proceed or roll back.

This is fundamentally different from canary deployment.

A canary deploys a new version of the entire service to a subset of traffic.

A feature flag deploys the code to 100% of servers but controls who sees the new behavior at the application level.

Both achieve gradual rollout, but feature flags offer finer control (specific user segments, not random traffic percentages) and can be toggled without a deployment.

Progressive delivery combines feature flags with automated monitoring.

Tools like LaunchDarkly, Split.io, and Unleash can automatically roll back a feature flag if error rates exceed a threshold after the flag is enabled.

This creates a self-healing deployment process where risky changes are automatically contained.

The discipline required is cleaning up feature flags after they are fully rolled out. Every flag left in the code after full rollout is dead code that adds conditional complexity, increases testing surface, and confuses new engineers reading the codebase.

A/B Testing Infrastructure

A/B testing uses deployment infrastructure to run controlled experiments.

Instead of deploying one version and hoping it is better, you deploy two (or more) versions simultaneously, route a percentage of users to each version, measure their behavior, and statistically determine which version performs better.

How It Works

The A/B testing system assigns each user to a variant (A or B) based on a consistent hash of their user ID. This ensures a user always sees the same variant during the experiment (no flickering between versions).

The assignment is stored so that downstream analytics can attribute behavior to the correct variant.

Variant A (the control) is the existing experience. Variant B (the treatment) includes the change you want to test. Both variants run on the same infrastructure, and the routing happens at the application level (similar to feature flags).

After enough data is collected (typically one to four weeks depending on traffic volume), a statistical analysis determines whether variant B outperforms variant A on the target metric (conversion rate, engagement, revenue) with statistical significance. If B wins, it becomes the new default.

If A wins or the difference is not significant, B is discarded.

Infrastructure Requirements

A/B testing requires consistent user assignment (users must stay in their assigned variant for the duration of the experiment), metric collection (track the target metric per variant), statistical analysis (compute confidence intervals and significance), and isolation (experiments should not interfere with each other; a user in experiment 1 should not have their results contaminated by being simultaneously in experiment 2 unless interactions are accounted for).

Managed experimentation platforms include Optimizely, LaunchDarkly Experimentation, Google Optimize (now part of GA4), and Statsig.

Many large companies build custom experimentation platforms because the interaction effects between dozens of simultaneous experiments become complex.

Rollback Strategies and Version Management

No matter how thorough your testing, some problems only appear in production.

When they do, your ability to revert to the previous version quickly determines whether users experience a brief hiccup or a prolonged outage.

Instant Rollback (Blue-Green)

In a blue-green deployment, rollback means switching the load balancer back to the previous environment. This takes seconds and is the fastest rollback mechanism.

The old environment is still running exactly as it was before the switch.

Version Rollback (Rolling/Canary)

In rolling or canary deployments, rollback means deploying the previous artifact version using the same deployment strategy.

Kubernetes makes this easy: kubectl rollout undo reverts to the previous deployment version.

The cluster replaces the new pods with pods running the old version.

This requires that the previous artifact is still available in the container registry.

If you deleted old images aggressively, you cannot roll back.

Always retain at least the last 5 to 10 production artifacts.

Database Rollback Challenges

Code rollback is usually straightforward. Database rollback is not.

If the new version applied a database migration (added a column, changed a schema, migrated data), rolling back the code does not undo the database change.

The safest approach is to make database migrations backward-compatible.

A migration that adds a new column is backward-compatible: the old code simply does not use the new column.

A migration that renames or deletes a column is not backward-compatible: the old code references the old column name and breaks.

The expand-and-contract pattern handles non-backward-compatible changes safely.

First, expand: add the new column alongside the old one. Deploy code that writes to both.

Once all code uses the new column, contract: remove the old column. Each step is independently deployable and rollback-safe.

Version Management

Semantic versioning (major.minor.patch) communicates the nature of each change.

A major version bump means breaking changes.

A minor bump means new features that are backward-compatible.

A patch bump means bug fixes.

Not all teams use semantic versioning. Many use the git commit hash as the version identifier, which provides traceability without implying backward compatibility.

In container environments, tag images with both the git hash and a human-readable version: app:v2.3.1 and app:a3f8b2c1.

The human-readable tag simplifies communication ("roll back to v2.3.0").

The git hash provides exact traceability.

GitOps and Declarative Deployments

GitOps is a deployment methodology where git is the single source of truth for both application code and infrastructure/deployment configuration.

Instead of imperatively running deployment commands ("deploy version X to cluster Y"), you declaratively define the desired state in a git repository, and an automated agent ensures the actual state matches the desired state.

How GitOps Works

You maintain a git repository (the "ops repo" or "config repo") containing the Kubernetes manifests (or Helm charts, Kustomize files) that describe every resource running in your cluster: deployments, services, config maps, ingress rules.

An agent running inside the cluster (ArgoCD, Flux) continuously watches the git repository. When it detects a change (a new image version, a config update, a new service), it applies the change to the cluster automatically.

If someone makes a manual change to the cluster that does not match the git repository, the agent reverts it.

Git is the truth; the cluster conforms to git.

GitOps Workflow

A developer merges application code.

The CI pipeline builds and pushes a new container image (app:a3f8b2c1).

The pipeline (or a separate automation) updates the ops repo with the new image tag.

The GitOps agent detects the change and updates the cluster.

The application is now running the new version.

Rollback is a git revert. Revert the commit that updated the image tag.

The agent detects the revert and rolls the cluster back to the previous version. Every deployment and rollback is a git commit with a full audit trail.

Benefits and Trade-offs

GitOps provides a complete audit trail (every change is a git commit), self-healing (manual cluster changes are automatically reverted to match git), consistent process (deployments follow the same path regardless of who initiates them), and security (cluster credentials are needed only by the agent, not by developers or CI pipelines).

The trade-off is additional infrastructure (you need the GitOps agent running in every cluster) and a learning curve around the declarative model.

Emergency fixes require going through git rather than making a quick manual change, which feels slower during incidents.

Some teams allow emergency manual overrides with a process to commit the change to git afterward.

Popular GitOps Tools

ArgoCD is the most widely adopted GitOps tool for Kubernetes. It provides a web UI for visualizing the sync state of every application, supports Helm, Kustomize, and plain manifests, and can manage multi-cluster deployments.

Flux (by Weaveworks) is another CNCF project for GitOps. It is lighter than ArgoCD and runs as a set of Kubernetes controllers.

Flux excels at automated image updates: it can watch your container registry for new tags and automatically update the git repository when a new image is pushed.

Tool	UI	Multi-Cluster	Image Automation	Complexity
ArgoCD	Rich web UI and CLI	Yes	Via plugins	Medium
Flux	CLI-first	Yes	Built-in	Lower

Beginner Mistake to Avoid

New engineers sometimes skip CI/CD infrastructure because it feels like overhead that delays the first feature release. You spend two days setting up a pipeline instead of writing application code. This is a false economy.

Within the first month, the pipeline saves more time than it cost to build.

Without it, every deployment is a manual, error-prone process.

A broken deployment with no automated rollback capability at 2 AM is the moment you wish you had invested those two days.

Set up CI/CD before your first production deployment, even if the pipeline is simple (build, test, deploy).

To learn more about system design interviews, check out the latest System Design Interview Guide.

Interview-Style Question

Q: Your e-commerce platform ships releases every two weeks. The last two releases caused production incidents that took hours to resolve. Leadership wants to increase release frequency to daily while reducing risk. How do you achieve this?

A: The paradox of deployment safety is that deploying more frequently is actually safer than deploying less frequently. Smaller, more frequent changes are easier to test, easier to understand when they break, and easier to roll back. To get there: first, invest in CI with comprehensive automated tests so that broken code is caught before it reaches production. Second, adopt canary deployments so that each change is exposed to 5% of traffic before full rollout, with automated monitoring that triggers rollback if error rates spike. Third, implement feature flags so that new features can be deployed but kept invisible until the team is confident they work. Fourth, adopt GitOps with ArgoCD so that every deployment is a git commit with instant rollback via git revert. Fifth, make database migrations backward-compatible using the expand-and-contract pattern so that code rollback never conflicts with schema changes. The result: developers merge code daily, the pipeline builds and tests automatically, canary deployments catch production issues early with minimal user impact, and rollback is a 30-second git revert. Release frequency goes up while incident rate goes down.

CI/CD Pipeline

KEY TAKEAWAYS

CI ensures code is always buildable and tested. Continuous Delivery means code is always deployable. Continuous Deployment deploys every passing change automatically.
Build pipelines automate the journey from commit to production: build, test, scan, store artifact, deploy. The same artifact progresses through every environment.
Rolling deployments are the simplest. Blue-green provides instant rollback. Canary minimizes user exposure to bad code. Shadow deployment tests under real traffic with zero user risk.
Feature flags enable progressive delivery: deploy code to 100% of servers but control who sees the new behavior. Clean up flags after full rollout.
A/B testing uses deployment infrastructure to run controlled experiments, routing users to different variants and measuring which performs better.
Rollback requires retaining previous artifacts and making database migrations backward-compatible. The expand-and-contract pattern handles schema changes safely.
GitOps uses git as the single source of truth for deployment state. Agents (ArgoCD, Flux) ensure the cluster matches the declared state. Every deployment and rollback is a git commit.
Deploying more frequently with smaller changes is safer than deploying less frequently with larger changes, provided you have automated testing, canary deployments, and fast rollback.