System design

Backend Architecture Guide

A senior-level tour of backend system design, mapped onto NestJS and Node.js. This is the “how do you think about building a backend at scale” material — decisions before code, for the worst query, the worst network, and the worst traffic spike.

01 · Thinking in systems — the CRDDS framework

System design is decision-making before code. A structured flow keeps you calm under pressure:

Step	What you do	Time
C — Clarify	Functional vs non-functional requirements, scale (RPS, DAU), read/write ratio, consistency needs. Most candidates rush this and design the wrong system.	~15%
R — Rough HLD	Draw the high-level diagram: clients, gateway, services, datastores, queues. A city map, not a street map.	~20%
D — Deep dive	Zoom into one component (the data model, the rate limiter, the queue) — interfaces, edge cases.	~35%
D — Discuss trade-offs	Name alternatives and why you chose one. The senior signal.	~20%
S — Summarize	Recap, call out risks, handle follow-ups.	~10%

Senior tell Say the trade-off out loud: “I'll use cache-aside with a short TTL — I'm trading a little staleness for a big latency win.”

02 · Layered architecture in NestJS

Almost every Nest service shares the same skeleton — memorize it as your HLD template:

Layer	Responsibility	Nest equivalent
Presentation	HTTP/GraphQL/WS edge: parse, validate, serialize.	Controllers, resolvers, gateways, DTOs, pipes
Application / Domain	Use-cases and business rules — the most testable layer.	Services, use-case classes, domain models
Data / Repository	Combines remote + local, caching, maps rows → domain.	Repository providers over TypeORM/Prisma/Mongoose
Integration	External APIs, message brokers, cache.	HTTP clients, ClientProxy, cache-manager

For complex, long-lived domains, push to hexagonal architecture: the domain depends on ports (interfaces + DI tokens), and adapters bind concrete implementations — so infrastructure is swappable and the domain is pure. Don't pay that indirection cost on thin CRUD.

03 · The integration layer — calling other services

A production client to another service needs more than fetch: timeouts (never wait forever), retries with exponential backoff + jitter (only for idempotent/transient failures — network, 5xx, not 4xx), a circuit breaker (stop hammering a downed dependency), and a bulkhead (cap concurrent calls so one slow dependency can't exhaust your pool).

Wrap calls in a result/Observable with timeout(), propagate a correlation id (traceparent) for tracing, and map foreign DTOs to your domain at the boundary (an anti-corruption layer) so their schema changes don't ripple into your code.

Name the pattern Timeout + retry-with-backoff + circuit breaker + bulkhead = the resilience quartet.

04 · Storage & caching

Pick the store for the access pattern: relational (Postgres) for transactional integrity + joins; document (Mongo) for flexible schemas; key-value (Redis) for hot/ephemeral data; wide-column (Cassandra/Dynamo) for huge write-heavy time-series. Replication scales reads + HA (mind replica lag); sharding scales writes (pick a good shard key; cross-shard joins hurt).

Caching layers: in-memory (fast, per-instance, lost on restart) and shared Redis (survives restarts, consistent across replicas). Strategies: cache-aside (default), read/write-through, write-behind. Invalidate via TTL, delete-on-write, or versioned keys; guard stampede with a lock/single-flight. Name stale-while-revalidate.

05 · Scaling Node.js

Node scales by staying non-blocking and stateless. Keep CPU work off the event loop (worker threads / queues). Scale horizontally: run one process per core (cluster) or, more commonly, N container replicas behind a load balancer. Externalize all state (sessions, cache, uploads) to Redis/DB/object storage so any replica can serve any request.

Distributed-state trap Rate limiting, caching, scheduling, and websocket rooms default to per-instance memory. At scale they must use Redis (throttler storage, KeyvRedis, distributed locks, socket.io Redis adapter) or they're wrong across replicas.

Vertical vs horizontal Vertical (bigger box) buys time; horizontal (more boxes) is the real answer — and only works if you're stateless.

06 · Async & messaging

Decouple work with asynchrony. Queues (BullMQ/SQS/RabbitMQ) distribute work to competing consumers (consume-once); streams/logs (Kafka) are durable, replayable, ordered-per-partition, multi-consumer. Use queues for jobs (email, image processing); use streams for event sourcing and fan-out to many consumers.

Delivery is at-least-once (visibility timeout → redelivery on crash), so consumers must be idempotent. Retries use backoff + jitter → a DLQ with alerting. Back-pressure: autoscale workers on queue depth.

Decision guide Two-way low-latency → gRPC/WebSocket. Work distribution → queue. Replayable event history / many consumers → Kafka. One-way push to browser → SSE.

07 · Service boundaries — monolith to microservices

Start with a modular monolith: one module per bounded context, each owning its tables, cross-module calls through a small public API + adapter. One deploy, atomic refactors, in-process transactional calls — enforce boundaries in CI with dependency-cruiser.

Extract a microservice only for a concrete need: independent scaling, deployment, fault isolation, or team ownership. If boundaries were clean, you swap the in-process public service for a transport client implementing the same interface. Cross-service consistency uses sagas (compensating actions) + the transactional outbox, not 2PC.

Anti-pattern Premature microservices buy distributed-systems pain (network failures, eventual consistency, ops overhead) for no benefit. Earn them.

08 · Resilience & failure design

Design for failure as the default. The toolkit: timeouts everywhere, retries (idempotent only, backoff + jitter), circuit breakers, bulkheads, graceful degradation (serve stale cache / a reduced response instead of an error), and idempotency keys so retried side effects run at-most-once.

Fail fast and visibly: validate at the edge, cap request size and concurrency, and propagate cancellation (AbortSignal). Decide fail-open vs fail-closed per feature (a downed rate-limiter Redis: fail-open for availability, fail-closed for abuse-sensitive endpoints).

09 · Multi-tenancy

Choose an isolation model early (hard to reverse): Silo (DB per tenant — strongest isolation/compliance, priciest), Pool (shared tables + tenant_id — cheapest, noisy-neighbor risk), Bridge (schema per tenant). Tiers often mix (enterprise = silo, SMB = pool).

Resolve the tenant from subdomain/JWT claim, set it in request context (CLS), and enforce it everywhere — Postgres RLS makes a forgotten filter unable to leak rows. Scope cache keys by tenant; add per-tenant quotas/rate limits for noisy neighbors.

Cardinal sin Cross-tenant data leakage — a missing filter, a mis-resolved tenant, or a cache key without the tenant. RLS + tenant-scoped keys are the guardrails.

10 · Observability & SLOs

You can't operate what you can't see. Instrument four signals: health (liveness/readiness probes), metrics (RED — rate/errors/duration — plus Node internals: event-loop lag, heap, GC), traces (OpenTelemetry, propagated via traceparent), and structured logs with correlation ids.

Define SLOs (e.g. p99 latency < 300ms, 99.9% availability), track an error budget, and alert on guardrails — not on raw CPU. Treat a deploy as “done” only when crash/error rates hold across the rollout.

Deep dives

Six classic backend system-design prompts, each as Concept → Example → Gotcha → Senior answer. These are the ones interviewers reach for — rehearse the trade-offs out loud.

System design

Design a distributed rate limiter

Concept Cap requests per client per window. Algorithms: token bucket (allows bursts, smooths to an average — the API default), leaky bucket (constant drain), fixed window (cheap, doubles at boundary), sliding window log (exact, memory-heavy), sliding window counter (best accuracy/memory).

Example Token bucket in Redis: store {tokens, lastRefill} per key; on each request refill by elapsed×rate (capped), allow if ≥1 then decrement. Return 429 + Retry-After + X-RateLimit-*.

Gotcha Read-modify-write across replicas is a race — two requests both read 1 token left and both pass. Naive per-instance memory (Nest's default throttler) is wrong across replicas.

Senior answer Make refill→check→decrement atomic with a Redis Lua script (or INCR+EXPIRE). Run it at the edge/gateway for cheap global limits. Decide fail-open vs fail-closed if Redis is down; use Redis server time to avoid clock skew.

System design

Design a URL shortener

Concept Map a short code → long URL with ~100:1 read:write. 7 base62 chars ≈ 3.5T codes. KV store, cache-heavy reads, async analytics.

Example Code generation: counter+base62 (no collisions, but sequential/guessable + counter bottleneck), hash+first-7 (needs collision retry), or a Key Generation Service pre-generating random unused keys offline.

Gotcha 301 (permanent) is browser-cached and loses click analytics; sequential codes are enumerable; the redirect must not block on analytics.

Senior answer KGS for non-guessable codes; Redis cache-aside (~95% hit) + CDN for hot redirects; 302 when you need analytics; emit click events async to Kafka→warehouse so the redirect stays fast. DynamoDB/Cassandra at scale.

System design

Design a notification service

Concept Ingest events → template → route by user preference → per-channel queues → channel workers (email/SMS/push) → status webhooks → metrics. Multi-channel fan-out.

Example One event fans out to many recipients/devices via a queue, partitioned by user for order + parallelism; versioned templates + i18n; preference center with quiet hours and digests.

Gotcha At-least-once delivery means duplicates; SENT ≠ DELIVERED ≠ READ; a provider outage shouldn't drop everything; legal (unsubscribe/TCPA/GDPR).

Senior answer Idempotent consumers keyed on (event_id, recipient, channel); retries with backoff → DLQ; circuit breaker + channel fallback (push→SMS for critical); priority lanes so OTPs jump ahead of marketing. Exactly-once is impractical — design for at-least-once + dedup.

System design

Design a chat backend

Concept WebSocket gateways hold live connections (stateless nodes + a Redis registry mapping user→node). Messages stored in a wide-column store keyed by conversation_id + seq.

Example A server-side sequencer partitioned by conversation_id assigns a monotonic seq (don't trust client clocks); clients reorder/dedupe by (conversation_id, seq). Presence in Redis with TTL heartbeats.

Gotcha Ordering across a partition, exactly-once over flaky mobile, fan-out to huge groups (write amplification), reconnect/resume, multi-device sync.

Senior answer At-least-once + dedup by stable message_id; fan-out on write for small groups, on read / Kafka for large (hybrid by size — the celebrity problem); resume from last seq on reconnect; socket.io Redis adapter to fan out across nodes.

System design

Design a job/task queue

Concept Producer → broker → competing consumers pull. A visibility timeout hides a job while a worker holds it; success deletes it, a crash makes it reappear (redelivery) — the at-least-once engine.

Example BullMQ (Redis) for delayed/repeatable/priority jobs; SQS (managed, FIFO for order+dedup); Kafka for a replayable log. Priorities via separate queues; delayed jobs via a sorted set.

Gotcha Visibility timeout shorter than p99 duration → premature redelivery + double processing; standard queues don't guarantee order; a DLQ without alerting silently loses work.

Senior answer Idempotent consumers (idempotency key + dedup), visibility timeout > p99 (or heartbeat-extend), retries with backoff → DLQ with alerts, autoscale workers on queue depth, FIFO/partitioning when order matters.

System design

Design large file upload & streaming

Concept Don't route big files through the app's memory. Stream them, and prefer direct-to-object-storage with the app issuing credentials.

Example Issue a pre-signed S3 URL so the client uploads straight to object storage; for downloads/transforms, pipeline(readStream → transform → res); multipart upload for very large files.

Gotcha Buffering a whole file blows up memory and blocks the loop; no backpressure leaks memory; unvalidated type/size is a DoS + security hole.

Senior answer Pre-signed URLs to offload bandwidth; stream with pipeline (auto backpressure + cleanup); validate size + magic-number type with ParseFilePipe; process derivatives (thumbnails, virus scan) async via a queue.