Chapter 4: System Design Advanced Topics

4.5 Security in System Design

## Authentication & Authorization

Security starts with two fundamental questions.

Who are you?

And what are you allowed to do?

These two questions define the boundary between authentication and authorization, and getting them right is the foundation that everything else in this chapter builds on.

**Authentication vs. Authorization**

Authentication verifies identity. It answers "are you who you claim to be?"

When you enter your email and password to log in, the system authenticates you by checking whether the password matches what it has on record for that email.

Authorization determines permissions. It answers "are you allowed to do this?"

After you log in, the system checks whether your account has permission to access a specific resource, perform an action, or view a page.

An authenticated user is not necessarily authorized for every operation.

The distinction matters because they are separate concerns that should be handled by separate systems.

You can be authenticated (the system knows who you are) but not authorized (you do not have permission to delete other users' data).

You can also be authorized at one level but not another (you can read your own profile but cannot modify billing settings without admin privileges).

A common beginner mistake is treating authentication and authorization as the same thing. Checking that a user is logged in does not mean they should have access to everything.

Every API endpoint needs both checks: is this request from a verified user, and does that user have permission for this specific action?

**OAuth 2.0 and OpenID Connect**

OAuth 2.0 is the industry standard protocol for authorization delegation. It lets a user grant a third-party application limited access to their resources on another service without sharing their password.

When you click "Sign in with Google" on a website, OAuth 2.0 is at work.

The website redirects you to Google.

You authenticate with Google directly (the website never sees your Google password).

Google asks if you want to grant the website access to your email and profile information.

If you agree, Google gives the website a token that grants only the permissions you approved.

OAuth 2.0 defines four grant types (authorization code, implicit, client credentials, resource owner password), but the authorization code flow with PKCE is the recommended approach for most applications.

The others have known security weaknesses or are intended for specific machine-to-machine scenarios.

OpenID Connect (OIDC) is a thin identity layer built on top of OAuth 2.0. OAuth 2.0 by itself is an authorization protocol. It tells the application what the user allowed, not who the user is.

OIDC adds an ID token (a JWT containing user identity information like name and email) to the OAuth 2.0 flow, turning it into a complete authentication and authorization solution.

**JWT (JSON Web Tokens) and Session Management**

A JWT is a compact, self-contained token that carries information (claims) about a user. It consists of three parts: a header (specifying the signing algorithm), a payload (containing claims like user ID, email, roles, and expiration time), and a signature (proving the token was not tampered with).

JWTs look like this: `eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo0Mn0.signature_here`.

The first two sections are Base64-encoded JSON.

The third is a cryptographic signature.

The key property of JWTs is that they are self-contained.

The server does not need to look up a session in a database to validate the token. It simply verifies the signature using a shared secret (HMAC) or a public key (RSA/ECDSA).

This makes JWTs stateless, which aligns perfectly with horizontally scaled, stateless services.

* Session-based authentication stores the session state on the server (or in Redis). The client holds only a session ID cookie. On each request, the server looks up the session ID to find the user's identity and permissions. Sessions are easy to revoke (delete the session from the store), but they require a shared session store that all servers can access.

* JWT-based authentication stores the state in the token itself. No server-side lookup is needed. This eliminates the session store dependency. But JWTs are harder to revoke. Once issued, a JWT is valid until it expires. If a user's account is compromised, you cannot instantly invalidate their JWT without either maintaining a blacklist (which reintroduces server-side state) or waiting for the token to expire.

AspectSession-BasedJWT-Based
State storageServer-side (Redis, database)In the token (client-side)
ValidationLookup session by IDVerify signature (no lookup)
ScalabilityRequires shared session storeStateless (no shared store)
RevocationInstant (delete session)Difficult (token valid until expiry)
Best forTraditional web apps, sensitive opsAPIs, microservices, mobile clients

Many production systems use a hybrid: short-lived JWTs (15-minute expiry) for API authentication, combined with refresh tokens stored server-side that can be revoked instantly.

When the JWT expires, the client uses the refresh token to get a new JWT.

Revoking the refresh token prevents new JWTs from being issued.

**Single Sign-On (SSO) and SAML**

Single sign-on lets users authenticate once and access multiple applications without logging in again.

You log into your company's identity provider in the morning, and every internal tool (email, project management, code repository, HR system) recognizes you without a separate login.

* SAML (Security Assertion Markup Language) is the older SSO protocol, widely used in enterprise environments. It uses XML-based assertions exchanged between an identity provider (IdP, like Okta or Active Directory) and a service provider (the application). SAML is mature and deeply integrated into corporate IT infrastructure, but its XML complexity and browser-redirect-heavy flow make it cumbersome for modern applications. * OIDC (built on OAuth 2.0) has largely replaced SAML for new applications. It uses JSON instead of XML, works naturally with REST APIs and mobile apps, and is simpler to implement. Most consumer-facing SSO ("Sign in with Google/Apple/GitHub") uses OIDC.

ProtocolFormatBest ForComplexity
SAMLXMLEnterprise SSO, legacy systemsHigher
OIDCJSON (JWT)Modern apps, APIs, mobileLower

**Multi-Factor Authentication (MFA)**

MFA requires users to provide two or more independent forms of verification: something they know (password), something they have (phone, hardware key), or something they are (biometric).

Even if an attacker steals a password, they cannot access the account without the second factor. MFA dramatically reduces the effectiveness of credential stuffing, phishing, and brute-force attacks.

Common MFA methods include time-based one-time passwords (TOTP) generated by apps like Google Authenticator (the server and app share a secret, both generate the same 6-digit code every 30 seconds), SMS codes (a 6-digit code sent via text message, less secure because of SIM-swapping attacks), hardware security keys (FIDO2/WebAuthn devices like YubiKeys that use public-key cryptography, the most secure option), and push notifications (a prompt on the user's authenticated device asking them to approve the login).

In system design, MFA is not optional for any system handling sensitive data, financial transactions, or administrative access.

Design your authentication flow to support MFA from the start, even if you do not enforce it for all users immediately.

**Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC)**

After a user is authenticated, you need a model for determining what they can do. RBAC and ABAC are the two primary approaches.

RBAC assigns permissions to roles, and roles to users.

A user with the "editor" role can create and edit articles.

A user with the "admin" role can do everything an editor can, plus delete articles and manage users.

Permissions are not assigned to individual users but to roles.

This simplifies management: when a new hire joins the content team, you assign them the "editor" role, and they automatically get all editor permissions.

RBAC is straightforward and works well for most applications. It struggles when permissions depend on context.

"Editors can edit articles" is a clean RBAC rule.

"Editors can edit articles only in their department, only during business hours, and only if the article was created less than 24 hours ago" requires conditions that RBAC cannot express naturally.

ABAC evaluates permissions based on attributes of the user, the resource, the action, and the environment.

An ABAC policy might say: "Allow edit if user.department \== resource.department AND time.current is between 9:00 and 17:00 AND resource.age \< 24 hours."

ABAC is more flexible and expressive than RBAC but more complex to implement and manage.

ModelPermissions Based OnFlexibilityComplexityBest For
RBACRoles assigned to usersModerateLowMost applications, clear role hierarchies
ABACUser, resource, action, and environment attributesHighHigherComplex access rules, context-dependent permissions

Many systems start with RBAC and add ABAC-like conditions for specific edge cases rather than implementing full ABAC from scratch.

Interview-Style Question

> Q: Your SaaS application has three user types: viewer (read-only), editor (read and write), and admin (full access including user management). How do you implement access control?

> A: Use RBAC. Define three roles: viewer, editor, and admin. Define permissions for each role: viewer gets read access to all resources, editor gets read and write access, admin gets read, write, delete, and user management access. Assign one role to each user. On every API request, after authentication (verifying the JWT), extract the user's role from the token claims or from a database lookup. Before executing the operation, check if the user's role includes the required permission for that endpoint. Return 403 Forbidden if not. Store the role-to-permission mapping in a configuration that can be updated without redeploying. As the application grows, add more granular roles (like "department editor" who can only edit within their department) by extending the role set rather than building a full ABAC engine.

**KEY TAKEAWAYS**

* Authentication verifies identity. Authorization verifies permissions. Both are required on every request, and they should be handled separately. * OAuth 2.0 handles authorization delegation. OpenID Connect adds an identity layer for authentication. Together they power "Sign in with X" flows. * JWTs are stateless tokens ideal for APIs and microservices. Short-lived JWTs with server-side refresh tokens balance scalability with revocability. * SSO lets users log in once for multiple applications. OIDC is the modern standard; SAML persists in enterprise environments. * MFA is essential for any system with sensitive data. Hardware keys (FIDO2) are the most secure; TOTP apps are the most practical.

* RBAC is sufficient for most applications. ABAC adds flexibility for context-dependent permissions but at higher complexity.

## Data Security

Authentication and authorization control who can access data.

Data security controls what happens to the data itself: how it is protected in motion, at rest, and when it is no longer needed.

**Encryption in Transit: TLS/SSL**

Every byte of data traveling across a network can potentially be intercepted.

Encryption in transit ensures that even if someone captures the traffic, they cannot read it.

TLS (Transport Layer Security) is the standard protocol for encrypting network communication.

When you connect to a website over HTTPS, TLS encrypts the data between your browser and the server.

TLS works by establishing a secure channel through a handshake: the client and server agree on encryption algorithms, the server presents its certificate (proving its identity), and both sides derive session keys used to encrypt all subsequent communication.

TLS is non-negotiable for any system in production.

Every external API, every web interface, and every connection carrying user data should use TLS.

Internal service-to-service communication should also use TLS (mutual TLS in a service mesh, covered in Part II) for defense in depth.

Modern systems should use TLS 1.3, which is faster (one round-trip handshake instead of two) and more secure (removes support for weak cipher suites) than TLS 1.2.

**Encryption at Rest: AES, Key Management**

Encryption at rest protects data stored on disk.

If an attacker gains physical access to a hard drive or a stolen backup, encryption ensures the data is unreadable without the decryption key.

AES (Advanced Encryption Standard) is the most widely used symmetric encryption algorithm. AES-256 (using a 256-bit key) is the standard for encrypting data at rest.

Cloud providers offer encryption at rest as a built-in feature: AWS S3 encrypts objects with AES-256 by default, RDS supports encrypted storage, and EBS volumes can be encrypted at creation.

The encryption itself is the easy part.

Key management is where complexity lives. Who holds the encryption keys? Where are they stored? How are they rotated?

If the key is stored alongside the encrypted data, encryption provides no protection against an attacker who accesses the storage.

AWS KMS (Key Management Service), Google Cloud KMS, and Azure Key Vault provide managed key storage. Keys never leave the managed service.

Encryption and decryption operations happen inside the service, and access to keys is controlled by IAM policies. This separates the data (in your storage) from the keys (in the KMS), so compromising one does not compromise the other.

Key rotation replaces encryption keys on a regular schedule (typically annually).

Old keys are retained to decrypt data encrypted before the rotation.

New data is encrypted with the new key. Managed KMS services handle rotation automatically.

**Hashing and Salting Passwords**

Passwords should never be stored in plain text. They should never be stored encrypted either, because encryption is reversible, and anyone with the key can recover all passwords.

Instead, passwords are hashed: a one-way function transforms the password into a fixed-length string that cannot be reversed.

When a user creates a password "myP@ssw0rd", the system hashes it and stores only the hash.

When the user logs in, the system hashes the provided password and compares it to the stored hash.

If they match, the password is correct.

Salting adds a random string (the salt) to each password before hashing.

Without salts, two users with the same password produce the same hash, making it possible to identify common passwords by looking for repeated hashes.

With unique salts, identical passwords produce different hashes.

bcrypt, scrypt, and Argon2 are the recommended password hashing algorithms. They are intentionally slow (configurable work factor), making brute-force attacks computationally expensive.

Generic hash functions like SHA-256 are too fast for password hashing.

An attacker can compute billions of SHA-256 hashes per second. bcrypt with a work factor of 12 limits them to thousands per second.

Never use MD5 or SHA-1 for password hashing. Both are broken and should not be used for any security-sensitive purpose.

**Data Masking and Tokenization**

Data masking replaces sensitive data with realistic-looking but fake data.

A credit card number `4111-2222-3333-4444` becomes `4111-XXXX-XXXX-4444` in logs, support dashboards, and non-production environments.

The original data is preserved in the source system, but anyone viewing the masked data cannot reconstruct the original.

Tokenization replaces sensitive data with a non-sensitive token that maps back to the original through a secure lookup.

A payment processor stores the real credit card number and gives your application a token like `tok_abc123`. Your application stores and processes the token.

When a charge is needed, the token is sent to the payment processor, which looks up the real card number. Your system never touches the actual card data.

Tokenization is the foundation of PCI DSS compliance.

By using a tokenization service (like Stripe or Braintree), your system never stores, processes, or transmits actual card numbers, which dramatically reduces your compliance scope.

**PII Handling and Data Classification**

PII (Personally Identifiable Information) is any data that can identify an individual: names, email addresses, phone numbers, social security numbers, IP addresses, and location data.

Data classification assigns sensitivity levels to different types of data.

A common classification scheme uses four tiers.

Public data (marketing content, product descriptions) has no access restrictions.

Internal data (company policies, internal documentation) is restricted to employees.

Confidential data (customer email addresses, order history) requires access controls and encryption.

Restricted data (social security numbers, credit card numbers, health records) requires the strictest controls: encryption at rest and in transit, access logging, and minimal retention.

The classification determines the controls applied.

Public data needs no encryption.

Restricted data needs encryption, access logging, data masking in non-production environments, and deletion after the retention period expires.

Interview-Style Question

> Q: Your application stores user profiles with names, email addresses, and credit card information for recurring billing. How do you protect this data?

> A: Layer the protections by data sensitivity. Names and emails are Confidential: encrypt the database at rest (RDS encryption), enforce TLS for all connections, and restrict access to the database to only the services that need it. Credit card data is Restricted: never store actual card numbers. Use a payment tokenization service like Stripe. Your database stores only the Stripe token (like `tok_abc123`), not the card number. When you need to charge the card, send the token to Stripe, which handles the actual card data. This removes your system from PCI DSS scope for card storage. For non-production environments, mask all PII: replace real names and emails with fake data so developers and testers never see actual customer information. Add audit logging on all access to the user profiles table so you have a record of who accessed what and when.

**KEY TAKEAWAYS**

* TLS encrypts data in transit. Use TLS 1.3 for all external and internal communication. HTTPS is non-negotiable. * AES-256 encrypts data at rest. Use managed KMS services for key storage and automatic rotation. Never store keys alongside the encrypted data.

* Hash passwords with bcrypt, scrypt, or Argon2 with unique salts. Never use MD5, SHA-1, or reversible encryption for passwords. * Tokenization replaces sensitive data with non-sensitive tokens, reducing your compliance scope. Use it for credit card numbers and other restricted data. * Classify data by sensitivity (public, internal, confidential, restricted) and apply controls proportional to the classification.

## Application & Infrastructure Security

Authentication protects the front door. Data security protects the vault.

Application and infrastructure security protects everything in between: the walls, the windows, the ventilation, and the foundation.

**Input Validation: Preventing XSS and SQL Injection**

The majority of application vulnerabilities come from trusting user input.

Every field, parameter, header, and cookie that comes from outside your system is a potential attack vector.

SQL injection happens when user input is inserted directly into a SQL query.

If a login form passes the username directly into `SELECT * FROM users WHERE username = '${input}'`, an attacker can enter `' OR 1=1 --` as the username, which transforms the query into one that returns all users.

The fix is simple and absolute: use parameterized queries (prepared statements) for every database query. Never concatenate user input into SQL strings.

XSS (Cross-Site Scripting) happens when user input is rendered in a web page without sanitization.

If a comment field allows `<script>alert('hacked')</script>` and the page renders it as HTML, the script executes in every visitor's browser.

The attacker can steal session cookies, redirect users, or modify page content.

The fix is to sanitize all output by escaping HTML entities. Modern frontend frameworks (React, Vue, Angular) escape output by default, but raw HTML rendering (`dangerouslySetInnerHTML` in React) bypasses this protection.

Input validation should happen at multiple layers: the client (for user experience), the API gateway or middleware (rejecting malformed requests early), and the application logic (validating business rules).

Defense in depth means not relying on any single layer to catch all malicious input.

**CSRF Protection**

CSRF (Cross-Site Request Forgery) tricks a user's browser into making a request to your application while the user is authenticated.

If a user is logged into their banking app and visits a malicious website, the malicious site can trigger a fund transfer request to the bank using the user's existing session cookie.

The standard defense is a CSRF token: a random value generated by the server and embedded in every form or request.

The server validates that the token in the request matches the one it issued.

Since the attacker's malicious site cannot access the CSRF token (due to the same-origin policy), it cannot forge a valid request.

For API-only applications that use JWTs in the Authorization header (instead of cookies), CSRF is not a concern because the attacker's site cannot read or send the JWT.

CSRF protection is primarily relevant for cookie-based session authentication.

**DDoS Mitigation Strategies**

A DDoS (Distributed Denial of Service) attack floods your system with traffic from many sources simultaneously, overwhelming your servers, bandwidth, or application logic.

Mitigation operates at multiple layers.

Network-level mitigation uses services like AWS Shield, Cloudflare, or Akamai Prolexic to absorb and filter volumetric attacks (terabits of junk traffic) at the edge, long before it reaches your infrastructure.

CDN absorption distributes traffic across hundreds of edge nodes, making it difficult for an attacker to overwhelm a single origin.

Rate limiting (covered in Chapter IV) limits per-client request rates, slowing down application-layer attacks.

Auto-scaling absorbs legitimate traffic surges but should not be relied on alone (an attacker can drive your cloud bill up by triggering continuous scaling).

Geo-blocking restricts traffic from regions where you have no users, reducing the attack surface.

No single technique stops all DDoS attacks.

Production systems layer multiple defenses: edge-level scrubbing (Cloudflare), CDN distribution, rate limiting at the gateway, and application-level validation of request patterns.

**WAF (Web Application Firewall)**

A WAF inspects HTTP traffic and blocks requests that match known attack patterns. It sits in front of your application (typically at the CDN or load balancer level) and filters requests based on rules.

WAF rules can detect and block SQL injection attempts, XSS payloads, known exploit patterns (like Log4Shell), malformed requests, and requests from known malicious IP addresses.

AWS WAF, Cloudflare WAF, and Akamai Kona are widely used managed WAF services.

A WAF is not a substitute for secure coding. It is an additional layer that catches attacks that might slip through application-level defenses.

Think of it as a safety net beneath the tightrope, not a replacement for balance.

**Principle of Least Privilege**

Every user, service, and process should have only the minimum permissions required to perform its function. Nothing more.

A microservice that reads from a database should have read-only access, not read-write.

An engineer who manages deployments should have deployment permissions, not database administrator access.

A Lambda function that writes to S3 should have permission to write to its specific bucket, not to all S3 buckets in the account.

Least privilege limits the blast radius of a compromise.

If an attacker gains control of a service that only has read access to one database, the damage is contained.

If that same service had full admin access to every resource in your cloud account, the entire system is at risk.

Implement least privilege through IAM roles (not shared credentials), service-specific database users (not a universal admin account), and regular permission audits (remove access that is no longer needed).

**Network Security: VPC, Firewalls, Security Groups**

A VPC (Virtual Private Cloud) is your isolated network within a cloud provider. Resources inside the VPC can communicate with each other but are not accessible from the public internet unless you explicitly allow it.

Subnets divide your VPC into public (internet-facing) and private (internal only) segments. Load balancers and API gateways sit in public subnets. Application servers, databases, and caches sit in private subnets with no direct internet access.

Security groups are virtual firewalls attached to individual resources. They define which inbound and outbound traffic is allowed. A database security group might allow inbound connections only from application server security groups on port 5432 (PostgreSQL). All other traffic is denied by default.

Network ACLs provide an additional layer of firewall rules at the subnet level, controlling traffic in and out of entire subnets.

The principle: application servers should not be reachable from the internet. Databases should not be reachable from application servers in other VPCs. Internal services should communicate only with the specific services they depend on. Every network path should be explicitly allowed, not implicitly open.

**Secrets Management: HashiCorp Vault, AWS Secrets Manager**

Secrets include database passwords, API keys, encryption keys, TLS certificates, and third-party service credentials.

They should never appear in source code, configuration files checked into version control, environment variables visible in process listings, or application logs.

AWS Secrets Manager stores secrets encrypted and provides API access with IAM-controlled permissions. It supports automatic rotation of database credentials and integration with RDS, Redshift, and DocumentDB.

HashiCorp Vault is a more feature-rich, cloud-agnostic secrets management system. It provides dynamic secrets (generating short-lived database credentials on demand), secret leasing (automatically revoking secrets after a TTL), encryption as a service (applications can encrypt and decrypt data through Vault without managing keys), and detailed audit logging of all secret access.

The pattern: applications request secrets from the secrets manager at startup or on demand.

Secrets are never baked into container images, stored in environment variables, or committed to repositories.

Rotation happens automatically without application restarts (the application fetches the current secret from the manager on each use or caches it with a short TTL).

**Zero-Trust Architecture**

Traditional network security follows a "castle and moat" model: everything outside the network is untrusted, but once you are inside the network perimeter, you are trusted.

Zero-trust architecture rejects this model.

Nothing is trusted by default, regardless of network location.

In a zero-trust architecture, every request is authenticated and authorized, whether it comes from outside the network or from another service inside the same data center.

Network location does not grant trust. Service-to-service communication uses mutual TLS (both sides verify each other's identity). Access decisions are based on identity, device health, and request context, not network membership.

Zero-trust is the direction that modern security architecture is heading.

Service meshes (Chapter II) implement zero-trust principles by default through sidecar proxies that enforce mutual TLS and authorization policies on every inter-service call.

Interview-Style Question

> Q: Your team is building a new microservice that needs to access a PostgreSQL database and a third-party payment API. How do you handle the credentials securely?

> A: Store both credentials in a secrets manager (AWS Secrets Manager or HashiCorp Vault). The microservice retrieves the database password and the API key from the secrets manager at startup. The database credential is configured for automatic rotation in Secrets Manager, which updates both the secret and the database password simultaneously. The payment API key is stored as a versioned secret with access restricted to only this specific microservice's IAM role (principle of least privilege). No credentials appear in the codebase, container image, or environment variables. The database sits in a private subnet, accessible only from the application's security group on port 5432\. The third-party API call goes through HTTPS with TLS 1.3. All secret access is audit-logged. If the microservice is compromised, the attacker gets a database credential that is valid only for a short window (Vault dynamic secrets) and an API key that can be revoked instantly in the secrets manager.

**KEY TAKEAWAYS**

* Use parameterized queries to prevent SQL injection and escape all output to prevent XSS. Never trust user input at any layer. * DDoS mitigation requires multiple layers: edge scrubbing (Cloudflare/Shield), CDN distribution, rate limiting, and application-level validation.

* A WAF blocks known attack patterns at the HTTP level but does not replace secure coding practices. * Apply the principle of least privilege everywhere: IAM roles, database users, network access, and API permissions. * Isolate resources with VPCs, private subnets, and security groups. Databases and internal services should never be publicly accessible. * Store all secrets in a managed secrets manager. Never in code, config files, or environment variables. Rotate secrets automatically. * Zero-trust architecture authenticates and authorizes every request regardless of network location. Mutual TLS and identity-based access replace perimeter-based trust.

**5.4 Compliance & Privacy**

Security protects your system from threats.

Compliance ensures your system meets legal and regulatory requirements for how data is collected, stored, processed, and shared.

In a system design interview and in production systems, compliance requirements often drive architectural decisions that affect database choice, deployment region, data retention, and logging infrastructure.

**GDPR, CCPA, HIPAA Compliance Considerations**

#### GDPR (General Data Protection Regulation)

It is the European Union's data protection law. It applies to any system that processes personal data of EU residents, regardless of where the system is hosted.

Key requirements include explicit consent for data collection, the right for users to access their data, the right to be forgotten (delete all personal data on request), data breach notification within 72 hours, and data processing agreements with all third parties that handle personal data.

Architecturally, GDPR requires that you can identify and delete all data belonging to a specific user across every database, cache, backup, log, and analytics system.

If your data is scattered across 15 services with independent databases, satisfying a deletion request is an engineering challenge that should be designed for upfront, not retrofitted.

#### CCPA (California Consumer Privacy Act)

CCPA (California Consumer Privacy Act) gives California residents similar rights: knowing what data is collected, requesting deletion, and opting out of data sales.

CCPA is less prescriptive than GDPR about technical implementation but carries significant financial penalties for violations.

#### HIPAA (Health Insurance Portability and Accountability Act)

HIPAA (Health Insurance Portability and Accountability Act) governs protected health information (PHI) in the United States. It requires encryption of PHI at rest and in transit, access controls and audit logging for all PHI access, business associate agreements with any third party handling PHI, and physical and administrative safeguards.

HIPAA compliance affects cloud infrastructure choices.

Not all cloud services are HIPAA-eligible.

AWS, Azure, and GCP each provide a subset of services that meet HIPAA requirements, and you must sign a Business Associate Agreement (BAA) with the provider.

**Data Residency and Sovereignty**

Data residency laws require that certain data be stored within specific geographic boundaries.

GDPR restricts transferring EU residents' data outside the EU unless the destination country has adequate data protection laws or specific safeguards are in place.

For system design, data residency means your database for EU users must be in an EU region.

You cannot simply replicate all data to a US data center for disaster recovery without addressing the legal requirements for cross-border data transfer.

Architecturally, this often leads to geo-partitioned databases (EU users' data in eu-west-1, US users' data in us-east-1), region-specific backup and replication policies, and careful configuration of CDNs and analytics pipelines to ensure data does not flow outside permitted regions.

**Audit Logging and Compliance Reporting**

Audit logs record who did what, when, and to which resource. They are distinct from application logs (which record system behavior) and access logs (which record HTTP requests).

Audit logs capture security-relevant events: user login and logout, permission changes, data access (especially sensitive data), data modifications and deletions, administrative actions, and failed authentication attempts.

Audit logs must be tamper-proof.

If an attacker compromises your system and can delete audit logs, the forensic trail disappears.

Store audit logs in a write-only, append-only system (like an immutable S3 bucket with Object Lock, or a separate logging account that application credentials cannot modify).

Forward logs to a centralized SIEM (Security Information and Event Management) system for analysis and alerting.

Compliance reporting uses audit logs to generate evidence for regulators and auditors.

"Show us all access to customer PII in the last 90 days."

"Prove that only authorized personnel accessed financial records."

If your audit logging is comprehensive and tamper-proof, generating these reports is straightforward.

If logging was an afterthought, compliance audits become expensive, manual exercises.

**Secure Software Development Lifecycle (SSDLC)**

The SSDLC integrates security into every phase of software development, rather than treating it as a final checkbox before release.

  1. 1.Design phase: Threat modeling identifies potential attack vectors and drives security requirements. "What happens if an attacker intercepts the token?" and "What if a compromised service accesses another service's database?" are questions answered during design, not after deployment.
  1. 1.Development phase: Secure coding standards prevent common vulnerabilities. Code reviews include security checks. Static analysis tools (like SonarQube, Semgrep, or Snyk) scan code for known vulnerability patterns during the build pipeline.
  2. 2.Testing phase: Dynamic application security testing (DAST) tests the running application for vulnerabilities. Penetration testing simulates real-world attacks. Dependency scanning identifies known vulnerabilities in third-party libraries (a critical concern given the frequency of supply-chain attacks).
  3. 3.Deployment phase: Container image scanning checks for vulnerabilities in base images. Infrastructure as Code is reviewed for misconfigurations (public S3 buckets, overly permissive security groups). Secrets are verified to not be present in the deployment artifact.
  4. 4.Operations phase: Runtime security monitoring detects anomalous behavior (unusual API call patterns, unexpected data access). Vulnerability patches are applied promptly. Incident response plans are tested regularly.

The SSDLC is not a sequential process but a continuous loop.

Security reviews happen at every stage, and findings from operations (incidents, vulnerability discoveries) feed back into the design of future features.

**Beginner Mistake to Avoid**

New engineers sometimes treat security and compliance as features that can be added later, after the product ships. This approach is extraordinarily expensive.

Retrofitting GDPR-compliant data deletion across a system that was never designed for it can take months.

Adding audit logging to a system that never had it requires touching every service that accesses sensitive data.

Encrypting a database that was created without encryption requires a full data migration.

Design for security and compliance from day one. The incremental cost during initial development is a fraction of the cost of retrofitting later.

Interview-Style Question

> Q: You are designing a healthcare application that stores patient records. What compliance and security requirements would shape your architecture?

> A: HIPAA compliance drives every major architectural decision. Data storage: patient records must be encrypted at rest (AES-256 via AWS KMS) and in transit (TLS 1.3). Use HIPAA-eligible AWS services and sign a BAA. Database: PostgreSQL on RDS with encryption enabled, in a private subnet with security groups restricting access to only the application servers. Access control: RBAC with roles for doctors, nurses, administrators, and patients. Each role has strictly defined permissions. MFA is required for all provider and admin accounts. Audit logging: every access to a patient record is logged (who, when, which record, what action) to an immutable, centralized audit log. Logs are retained for the HIPAA-required minimum of 6 years. Data residency: patient data stays in US regions. Backups are encrypted and stored in US-based regions only. Secrets: database credentials and API keys stored in Vault or AWS Secrets Manager with automatic rotation. No credentials in code or configuration. Network: VPC with private subnets for all data-handling services. No direct internet access for any component that touches patient data. Incident response: a documented plan for data breach notification within 60 days (HIPAA requirement) with regular tabletop exercises.

**KEY TAKEAWAYS**

* GDPR, CCPA, and HIPAA each impose specific requirements on data collection, storage, access, and deletion. These requirements directly shape database design, deployment regions, and data pipelines. * Data residency laws restrict where data can be stored geographically. Plan for geo-partitioned storage if your users span regulated jurisdictions.

* Audit logs must capture security-relevant events and be tamper-proof. Store them in immutable, centralized systems separate from application infrastructure. * The SSDLC integrates security into design, development, testing, deployment, and operations. Security is a continuous practice, not a pre-launch checklist. * Design for compliance from day one. Retrofitting security, logging, encryption, and data deletion across an existing system is orders of magnitude more expensive than building it in from the start.