System design vs Nuxt fullstack developer
20 patterns from distributed systems - caching, circuit breakers, CAP theorem, sharding, event-driven architecture - mapped to real implementations in Nuxt and Nitro.
System design interviews talk about millions of users and terabytes of data. Most of us are building something used by thousands. But the vocabulary is still useful, and some of the patterns apply much earlier than you'd expect.
This covers 20 concepts from distributed systems through the lens of someone building Nuxt apps in production. Four categories. Where each one shows up and when it actually matters.
Performance and scaling
Caching
The most impactful optimization most apps implement too late. Every read that could return the same data twice is a candidate - not because databases are slow, but because they have a ceiling, and your app's performance is bounded by that ceiling.
Nitro has three caching layers you can reach for in sequence:
In-process memory - fastest, lost on restart, not shared between instances:
export default defineCachedEventHandler(async (event) => {
return db.select().from(products).where(eq(products.active, true))
}, {
maxAge: 60 * 5,
getKey: (event) => `products:${getQuery(event).category ?? 'all'}`
})
Redis - shared across instances, survives restarts, manually invalidatable. One config change:
export default defineNuxtConfig({
nitro: {
storage: {
cache: { driver: 'redis', url: process.env.REDIS_URL }
}
}
})
CDN - the layer most teams miss. Public endpoints with infrequently-changing data should set Cache-Control headers and let Cloudflare or Vercel serve them globally:
setResponseHeader(event, 'Cache-Control', 'public, max-age=300, s-maxage=3600')
s-maxage=3600 tells the CDN to cache for an hour. Your origin handles a fraction of the traffic it otherwise would. The full caching strategy and Nitro patterns are covered in the system patterns article.
Load balancing
Load balancing distributes incoming requests across multiple server instances. You don't implement this - your infrastructure does (Nginx, Cloudflare, Kubernetes). What you do implement is the precondition: your app must be stateless.
If your Nitro server stores anything in process memory that needs to survive across requests or be visible to other instances - session data, counters, locks - you have a problem the moment you scale to two servers. Server A handles the login, Server B handles the next request, Server B doesn't know who you are.
The fix is always the same: move shared state into a shared store.
// Don't store sessions in Nitro's in-memory storage for multi-instance deploys
// Redis is the same regardless of which instance handles the request
const storage = useStorage('cache') // backed by Redis in production
await storage.setItem(`session:${sessionId}`, userData, { ttl: 3600 })
If your app is a single instance, load balancing is someone else's problem. When you add a second instance, memory state breaks silently.
Horizontal vs vertical scaling
Two ways to handle more traffic:
| Vertical | Horizontal | |
|---|---|---|
| What | Bigger server (more CPU/RAM) | More servers |
| Ceiling | Hardware limit | Theoretically unlimited |
| Complexity | None | Requires stateless app + shared state |
| Downtime | Restart required | Rolling deploys, zero downtime |
| Right choice when | First scaling decision | Vertical ceiling hit, or zero-downtime required |
Vertical scaling is the right first answer. Double the RAM before you double the servers. It is simpler, cheaper at small scale, and defers the complexity of distributed state. A $400/month server handles significant load.
Horizontal scaling becomes necessary when you've hit the vertical ceiling or need zero-downtime deploys. Nuxt is designed for it - Nitro on Vercel, Netlify, or Cloudflare Workers is horizontal scaling by default. Each request can land on a different instance. This is why the Redis-for-state pattern matters even before you have multiple servers.
Database indexing
An unindexed query on a large table is a full table scan - every row read, every time. Indexes are the single highest-leverage database optimization available to most applications and the most commonly skipped.
With Drizzle ORM, define indexes alongside your schema:
export const orders = pgTable('orders', {
id: uuid('id').primaryKey().default(sql`gen_random_uuid()`),
userId: uuid('user_id').notNull(),
status: text('status').notNull(),
createdAt: timestamp('created_at').defaultNow(),
}, (t) => ({
// Single column - fast lookups by user
userIdx: index('orders_user_id_idx').on(t.userId),
// Composite - covers queries filtering by status AND sorting by date
statusCreatedIdx: index('orders_status_created_at_idx').on(t.status, t.createdAt),
}))
Column order in composite indexes matters: (status, created_at) supports queries filtering by status, and queries filtering by both - but not queries filtering only by created_at. Put the most selective column first.
Verify indexes are being used with EXPLAIN ANALYZE:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = '...' AND status = 'pending'
ORDER BY created_at DESC;
If you see Seq Scan where you expect Index Scan, either the index doesn't cover the query or the planner determined the table is small enough that a scan is faster (correct behavior - don't fight it).
Rate limiting
Covered in depth in the system patterns article. The short version: Nitro middleware plus useStorage gives you per-IP rate limiting with minimal code, backed by Redis in production. The important extension: different limits for different endpoints - auth endpoints need stricter limits (brute force protection) than read endpoints.
Data and storage
SQL vs NoSQL
The default answer is PostgreSQL. Most applications - CRUD, transactional, relational data - are better served by a well-designed relational schema than a flexible document store. PostgreSQL's JSONB column type eliminates one of the few legitimate NoSQL advantages (schema flexibility).
| SQL (PostgreSQL) | NoSQL (document/key-value/wide-column) | |
|---|---|---|
| Data shape | Structured, defined schema | Flexible, schemaless |
| Relationships | Native (JOINs, foreign keys) | Application-level |
| Transactions | ACID, multi-table | Varies by engine |
| Query power | Full SQL | Limited or engine-specific |
| Scale pattern | Vertical, then read replicas | Designed for horizontal |
| Best for | Most applications | Specific patterns listed below |
NoSQL makes sense for specific roles:
- Redis: caching, sessions, queues, pub/sub - it's a data structure server, use it for these
- Cassandra / DynamoDB: write-heavy time-series data at very large scale
- MongoDB: truly schemaless documents where record shape varies significantly
- Elasticsearch: full-text search at scale
Most applications that reach for MongoDB early would be better off with PostgreSQL plus JSONB for the flexible parts. The query power of SQL is hard to replace once you need joins, aggregations, or complex filters.
Replication
Replication copies data from a primary to one or more replicas asynchronously. The primary handles all writes; replicas can handle reads.
Two practical benefits: redundancy (if the primary fails, promote a replica) and read scaling (distribute read traffic).
In a Nuxt app, you'd use two database connections:
import { drizzle } from 'drizzle-orm/node-postgres'
import { Pool } from 'pg'
// All writes go here
export const writeDb = drizzle(new Pool({ connectionString: process.env.DATABASE_URL }))
// Reads can go here - potentially stale by replication lag
export const readDb = drizzle(new Pool({ connectionString: process.env.DATABASE_READ_URL }))
export default defineEventHandler(async (event) => {
const body = await readBody(event)
// Writes always to primary
const [order] = await writeDb.insert(orders).values(body).returning()
return order
})
export default defineCachedEventHandler(async () => {
// Stale by a few milliseconds is acceptable for a product catalog
return readDb.select().from(products).where(eq(products.active, true))
}, { maxAge: 30 })
The catch is replication lag. Replicas trail the primary by milliseconds to seconds. After a write, reading immediately from a replica might return stale data. For user-facing write-then-read flows ("you just placed an order, here are your orders"), read from the primary. For background loads and public data, replicas are fine.
Sharding
Sharding splits a single database into multiple shards, each holding a subset of the data. A users table sharded by region means European users live in the EU shard, American users in the US shard.
This solves a problem you almost certainly don't have yet. Sharding makes sense when a single PostgreSQL instance can no longer handle your write volume - typically above 10,000 sustained writes per second - or when your dataset is large enough that even indexed queries are slow.
The operational cost is significant: cross-shard queries become application-level joins, transactions spanning shards require distributed transaction protocols, and resharding when your partition key was wrong is painful.
If you do need it: Citus extends PostgreSQL with transparent sharding. PlanetScale and CockroachDB handle it as managed services.
The practical Nuxt callout: if you're sharding by tenant in a multi-tenant SaaS app, you can do it at the connection level - route requests to a different database URL based on tenant ID. No special library needed, just a map of tenant IDs to connection strings.
CAP theorem
Every distributed system makes a tradeoff between three properties:
- Consistency: every read receives the most recent write, or an error
- Availability: every request receives a response (not necessarily the latest data)
- Partition tolerance: the system keeps operating when network communication between nodes fails
The theorem states that you can only guarantee two of the three simultaneously during a network partition. Partition tolerance is not optional in distributed systems - networks do fail - so the real choice is between consistency and availability.
AP (availability + partition tolerance): during a split, nodes keep responding but might return stale data. DynamoDB and Cassandra default here. Responses are always fast; recency is not guaranteed.
For a Nuxt app with a single PostgreSQL database, CAP doesn't apply - it describes distributed systems with multiple nodes. Once you add read replicas or multiple database nodes, the tradeoffs become real.
In practice: most applications benefit from an AP posture for reads (slightly stale data is acceptable) and a CP posture for writes (must be consistent, better to fail than corrupt). That's roughly what "read from replica, write to primary" gives you.
Consistency models
Consistency models define what guarantees you have about the order and visibility of writes across distributed nodes. The four you'll encounter in practice:
Strong consistency: every read sees the most recent write. Guaranteed on a single node, expensive on distributed systems. A single PostgreSQL instance gives you this.
Eventual consistency: given no new writes, all nodes will eventually converge to the same value. Redis replication is eventually consistent - a write to the primary propagates to replicas with a small delay.
Read-your-writes consistency: you always see your own writes, even if other users might temporarily see stale data. This is the minimum a user expects after submitting a form.
Monotonic reads: you never see older data than data you've already seen. If you read version 5 of something, you won't subsequently read version 3.
In Nuxt, the relevant design question is: when using read replicas, which requests need to hit the primary? A reliable rule: always read from the primary immediately after a write within the same user session. Background loads, public pages, and analytics can use replicas.
Reliability
Fault tolerance
Fault tolerance is designing with the assumption that things will fail - not "if the email service is down" but "when the email service is down." The question is whether that failure propagates to the user.
Graceful degradation: the system keeps running with reduced functionality. If the recommendation engine is down, show a generic product list instead of returning a 500.
export default defineEventHandler(async () => {
const products = await db.select().from(featured_products)
let recommendations: Product[] = []
try {
// External ML service - might be slow or down
recommendations = await $fetch('https://recommendations.internal/api/suggest', {
timeout: 500, // fail fast, don't let this block the page
})
} catch {
// Fail open - homepage loads without recommendations
// Important: log this, but don't surface to the user
}
return { products, recommendations }
})
Isolation: failures in one component don't cascade. If image processing crashes, the API should not crash. This is achievable within a monolith by treating every external call as unreliable and wrapping it defensively.
Ask "when this goes wrong, what should happen instead?"
Circuit breaker
Retries are good. Retrying a service that is fully down, 200 times per second, is not. It hammers a struggling service and prevents recovery. A circuit breaker wraps external calls and tracks failure rate. When failures cross a threshold, it opens the circuit - subsequent calls fail immediately without attempting the real call, giving the downstream service time to recover.
type State = 'CLOSED' | 'OPEN' | 'HALF_OPEN'
export class CircuitBreaker {
private state: State = 'CLOSED'
private failures = 0
private lastFailureTime = 0
constructor(
private readonly threshold = 5, // failures before opening
private readonly resetTimeout = 30_000 // ms before trying again
) {}
async call<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === 'OPEN') {
if (Date.now() - this.lastFailureTime < this.resetTimeout) {
throw new Error('Circuit open - request rejected')
}
this.state = 'HALF_OPEN' // allow one probe request through
}
try {
const result = await fn()
this.failures = 0
this.state = 'CLOSED'
return result
} catch (err) {
this.failures++
this.lastFailureTime = Date.now()
if (this.failures >= this.threshold || this.state === 'HALF_OPEN') {
this.state = 'OPEN'
}
throw err
}
}
}
const paymentBreaker = new CircuitBreaker(5, 30_000)
export const chargeCard = (token: string, amount: number) =>
paymentBreaker.call(() =>
stripe.paymentIntents.create({ amount, payment_method: token, confirm: true })
)
Three states: CLOSED (normal, calls go through), OPEN (failing fast, no calls attempted), HALF_OPEN (one probe call - success closes the circuit, failure keeps it open).
Retries with backoff
Covered in the system patterns article. $fetch has retry and retryDelay built in. For server-to-server calls, use exponential backoff. Do not retry 4xx errors - they indicate your request is wrong, not the server. Respect Retry-After headers on 429 responses.
Idempotency
Covered in the ticket booking article. Idempotency keys ensure that retrying a failed request doesn't process it twice. Required for any operation with side effects a user might retry: payments, email sends, form submissions over unreliable connections.
Health checks
A health check endpoint lets your load balancer, uptime monitor, and deployment system verify the app is ready to serve traffic. Without it, a failed deploy is silent until users report it.
export default defineEventHandler(async (event) => {
const checks = await Promise.allSettled([
db.execute(sql`SELECT 1`), // database reachable
useStorage('cache').setItem('_ping', '1'), // redis writable
])
const [dbCheck, redisCheck] = checks.map(r => r.status === 'fulfilled' ? 'ok' : 'error')
const healthy = checks.every(r => r.status === 'fulfilled')
setResponseStatus(event, healthy ? 200 : 503)
return {
status: healthy ? 'healthy' : 'degraded',
checks: { db: dbCheck, redis: redisCheck },
uptime: process.uptime(),
timestamp: new Date().toISOString(),
}
})
Two variants worth knowing:
Liveness: is the process alive? A simple 200. If this fails, restart the container.
Readiness: is the process ready to handle traffic? Checks dependencies. If this fails, stop sending traffic but don't restart - the process is alive but dependencies aren't ready.
In Kubernetes, these are separate endpoints (/health/live and /health/ready). In simpler deployments, one endpoint that returns 503 when dependencies are unavailable is enough.
Architecture patterns
Microservices vs monolith
For most teams building new products: start with a monolith.
Microservices solve problems of scale and organizational independence that most applications don't have yet. They introduce distributed systems complexity - network calls, distributed transactions, independent deployments, service discovery, distributed tracing - that is expensive to manage and adds no user value at small scale.
The case for starting with a monolith:
- Nuxt is a monolith by default and it's excellent at it
- You can modularize internally without paying the distributed tax
- You can extract services later with evidence, not speculation
The case for microservices:
- Different services need different runtime environments
- Independent deployment: billing team deploys without affecting checkout
- Genuinely different scaling requirements (image processing vs API)
- Organizational scale: 50+ engineers on one codebase becomes a coordination problem
A modular monolith is the pragmatic middle ground: a single deployable with clean internal boundaries. When you eventually need to extract a service, the boundary is already there.
Nuxt's server/ directory naturally supports this. Group by feature domain, not by technical layer:
server/
auth/
api/login.post.ts
api/register.post.ts
lib/session.ts
products/
api/[id].get.ts
api/index.get.ts
lib/catalog.ts
orders/
api/create.post.ts
tasks/process.ts
lib/payment.ts
Each domain owns its routes, its business logic, and its data access. Clean seam. No distributed systems overhead.
Event-driven architecture
Request-driven: service A calls service B and waits for a response. Simple, traceable, the default.
Event-driven: service A emits an event ("order placed"). Any number of subscribers handle it asynchronously. Service A doesn't know or care who handles the event.
When you place an order, several things need to happen: confirmation email, inventory update, fulfillment trigger, warehouse notification. In a request-driven system, the POST /orders handler calls all of these and waits. In an event-driven system, it emits order.created and returns immediately - each subscriber handles it independently.
import { Queue, Worker } from 'bullmq'
import { redis } from './redis'
export const events = new Queue('events', { connection: redis })
// Emit and forget - fire the event, don't wait for handlers
export const emit = (name: string, data: unknown) => events.add(name, data)
export default defineEventHandler(async (event) => {
const [order] = await db.insert(orders).values(body).returning()
// Response is already fast - downstream processing is someone else's problem
await emit('order.created', { orderId: order.id, userId: order.userId })
return order
})
new Worker('events', async (job) => {
if (job.name === 'order.created') {
await Promise.all([
sendOrderConfirmation(job.data),
updateInventory(job.data),
notifyFulfillment(job.data),
])
}
}, { connection: redis })
The downside: event-driven systems are harder to trace ("why didn't the email send?") and harder to reason about causally. The tradeoffs make sense when you have multiple downstream consumers for an event, or when handlers need to be independently scalable.
Message queues
A message queue buffers between a producer and a consumer. The producer writes a message; the consumer reads it at its own pace. If the consumer is slow or temporarily down, messages accumulate in the queue rather than being lost.
Redis via BullMQ handles this well at typical Nuxt application scale. For higher throughput or more complex routing: RabbitMQ (topic exchanges, complex routing) or Kafka (millions of events/second, event log replay).
The key properties to understand:
- Durability: messages survive consumer restarts, unlike in-memory event emitters
- Delivery guarantee: at-least-once (message might be processed twice) vs exactly-once (harder, expensive - most systems settle for at-least-once with idempotent consumers)
- Dead letter queue: messages that fail after N retries land here for investigation rather than being silently dropped
export const emailQueue = new Queue('emails', {
connection: redis,
defaultJobOptions: {
attempts: 3,
backoff: { type: 'exponential', delay: 1000 },
removeOnComplete: 100,
removeOnFail: false, // keep failed jobs in the dead letter queue for inspection
},
})
Full implementation is in the system patterns article.
API gateway
An API gateway sits in front of backend services and handles cross-cutting concerns: routing, authentication, rate limiting, request transformation, SSL termination.
For a Nuxt monolith, Nitro is the API gateway. Authentication middleware, rate limiting, logging - these run centrally in server/middleware/.
For microservices, the gateway routes requests to the right service. Nitro can act as one using proxyRequest:
const SERVICE_MAP: Record<string, string> = {
auth: process.env.AUTH_SERVICE_URL!,
catalog: process.env.CATALOG_SERVICE_URL!,
orders: process.env.ORDERS_SERVICE_URL!,
}
export default defineEventHandler(async (event) => {
const service = event.context.params!.service
const target = SERVICE_MAP[service]
if (!target) {
throw createError({ statusCode: 404, message: `Unknown service: ${service}` })
}
// proxyRequest handles headers, body, method, and response streaming
return proxyRequest(event, `${target}/${event.context.params!.path}`)
})
In practice, most teams use a dedicated gateway - Kong, AWS API Gateway, Cloudflare - rather than building one. Nuxt's role is usually as one service behind the gateway, not the gateway itself. The exception: Nuxt as gateway makes sense for proxying internal APIs during development or in single-region setups where the added network hop isn't justified.
Distributed tracing
When a request touches multiple services, a stack trace is no longer sufficient. You need to know which service, which call, how long each step took, and which upstream request triggered which downstream calls.
Distributed tracing gives each request a trace ID that flows through every service. OpenTelemetry standardizes the instrumentation; Jaeger, Zipkin, or Datadog collect and visualize the traces.
In Nitro, you can instrument via a plugin:
import { trace, context, propagation } from '@opentelemetry/api'
export default defineNitroPlugin((nitroApp) => {
nitroApp.hooks.hook('request', (event) => {
// Extract trace context from incoming headers (set by upstream service or load balancer)
const parentCtx = propagation.extract(context.active(), getHeaders(event))
const tracer = trace.getTracer('nitro')
const span = tracer.startSpan(`${event.method} ${event.path}`, undefined, parentCtx)
event.context.span = span
event.context.traceId = span.spanContext().traceId
})
nitroApp.hooks.hook('afterResponse', (event) => {
event.context.span?.end()
})
})
Where to start
Not all of this applies at day one. A rough priority order as a Nuxt app moves from "it works" to "it works in production":
Before launch
Database indexes on columns you query. Rate limiting on public endpoints. A /api/health endpoint. Basic caching with defineCachedEventHandler on expensive reads.
At early traffic
Redis for caching and sessions - this enables horizontal scaling. Retries with backoff on external API calls. Idempotency keys for any payment or write operation users might retry.
When dependencies become flaky
Circuit breakers on external services. Graceful degradation for non-critical features. Read replicas with write/read DB split for high-read workloads.
At organizational scale
Message queues for background work that should decouple from the request cycle. Microservice extraction - only when teams genuinely need independent deployment. Distributed tracing - only when services multiply and logs become insufficient.
The order matters. Indexes and rate limiting are cheap to add and high value. Sharding and microservices are expensive to operate and only justified at scale. Get the sequencing wrong and you're dealing with distributed systems complexity before you have distributed systems traffic.
Continue Reading
System patterns in Nuxt
Queues, caching, retries, rate limiting, feature flags - five infrastructure patterns every production Nuxt app eventually needs, and why Nitro makes them less painful.
Building a ticket booking system in Nuxt that doesn't double-book under load
The classic interview question with real code. SELECT FOR UPDATE vs atomic UPDATE, reservation TTL with Nitro tasks, idempotency keys