Data Generation & Realism Strategies
When mock payloads do not mirror production schemas, integration tests pass locally and break in staging — costing the engineers who discover the gap hours of archaeology. Frontend developers, QA engineers, and platform teams all share this pain: the further mock data drifts from the real API contract, the less value local development delivers.
Where data generation fits in the local-dev stack
Mock data generation sits between your API specification and every consumer that runs locally: developer workstations, ephemeral CI runners, end-to-end test suites, and Storybook component sandboxes. The diagram below shows the four layers and how data flows through them.
Understanding this stack matters because a gap at any layer cascades downstream. A type mismatch in the spec produces invalid fixtures; an unseeded generator produces irreproducible fixtures; a mock server without validation silently serves stale payloads to every consumer.
Core concept 1 — Schema-driven payload synthesis
Establishing type-safe, contract-compliant payloads begins with a formalised specification. Schema-driven data generation ensures every mock response respects required fields, data types, enum boundaries, and nested object relationships — eliminating drift during early development cycles before a backend exists.
The json-schema-faker CLI reads your OpenAPI file and writes fixtures to an output directory. A thin config file makes it environment-aware:
// schema-faker.config.js
module.exports = {
schemaPath: process.env.OPENAPI_SPEC || './specs/api-v2.yaml',
outputDir: './mocks/generated',
strictMode: process.env.NODE_ENV !== 'development',
options: {
useDefaultValue: true,
requiredOnly: false,
ignoreMissingRefs: false
}
};
Wire up npm scripts so developers and CI share the same command surface:
{
"scripts": {
"generate:mocks": "json-schema-faker-cli --config schema-faker.config.js",
"generate:mocks:ci": "CI=true npm run generate:mocks"
}
}
Why this matters for the request interception pattern: an interceptor that returns a hand-crafted payload is a liability; one that returns a spec-generated fixture is a contract enforcer. Running generate:mocks as a pre-commit hook or in CI on spec changes catches mismatches before they reach test suites.
Trade-offs for schema-driven generation vs. hand-authored fixtures:
| Dimension | Schema-driven generation | Hand-authored fixtures |
|---|---|---|
| Schema alignment | Always current (re-run on spec change) | Manual — drifts silently |
| Edge-case control | Requires explicit x-faker annotations |
Full control per file |
| Setup cost | One-time tool integration | Zero tooling |
| Maintenance burden | Low (automated) | High (grows with API surface) |
| CI integration | Script-friendly | Commit-required |
Use schema-driven generation as the default for all response shapes, and supplement with hand-authored fixtures only for specific edge cases (empty arrays, maximum-length strings, error payloads) that require deliberate authorship.
Core concept 2 — Deterministic seed management
Flaky tests and unpredictable UI states often trace back to unseeded randomness in mock data. Deterministic seed management anchors every faker call to a reproducible sequence, producing identical datasets across developer laptops, ephemeral CI runners, and staging preview environments while preserving statistically realistic value distributions.
// utils/mock-seed.ts
import { faker } from '@faker-js/faker';
const SEED = parseInt(process.env.MOCK_SEED ?? '42', 10);
faker.seed(SEED);
export interface MockUser {
id: string;
username: string;
email: string;
role: 'admin' | 'editor' | 'viewer';
createdAt: string;
}
/**
* Produces a deterministic user payload.
* Output is identical on every run when MOCK_SEED is fixed.
*/
export function generateUser(index: number): MockUser {
return {
id: faker.string.uuid(),
username: faker.internet.username(),
email: faker.internet.email(),
role: faker.helpers.arrayElement(['admin', 'editor', 'viewer'] as const),
createdAt: faker.date.past({ years: 2 }).toISOString()
};
}
Configure the seed at the environment boundary — never inside application code:
# .env.local (developer workstation — fixed seed for instant repro)
MOCK_SEED=123456789
# .github/workflows/integration.yml
env:
MOCK_SEED: ${{ github.run_id }} # Unique per run, fully traceable in logs
The two-seed strategy — fixed locally, per-run-ID in CI — gives you the best of both worlds: a developer can paste a run ID from a failed CI build into MOCK_SEED locally and reproduce the exact failure in seconds.
Seed propagation checklist:
-
faker.seed()is called exactly once, at module initialisation, not inside individual generator functions -
MOCK_SEEDis documented in.env.examplewith a comment explaining the two-seed strategy - CI pipeline logs the active
MOCK_SEEDvalue so failures can be reproduced - Snapshot tests pin the seed in
beforeAlland reset it inafterAll
Core concept 3 — Conditional logic and CI/CD integration
Production APIs rarely return identical payloads for identical requests. Realistic simulation requires context-aware routing — parsing request bodies, validating auth headers, and evaluating query parameters before returning a response. This logic must survive the same CI gates that run your unit and integration tests.
The following TypeScript MSW handler evaluates RBAC, field projection, and pagination in a single endpoint:
// mocks/handlers/users.ts
import { http, HttpResponse } from 'msw';
import { generateUser } from '../utils/mock-seed';
const PAGE_SIZE = 20;
export const userHandlers = [
http.get('/api/v1/users', ({ request }) => {
const auth = request.headers.get('Authorization');
if (!auth?.startsWith('Bearer ')) {
return HttpResponse.json({ error: 'Unauthorized' }, { status: 401 });
}
const url = new URL(request.url);
const page = parseInt(url.searchParams.get('page') ?? '1', 10);
const fields = url.searchParams.get('fields')?.split(',');
const users = Array.from({ length: PAGE_SIZE }, (_, i) =>
generateUser((page - 1) * PAGE_SIZE + i)
);
const projected = fields
? users.map(u =>
Object.fromEntries(
fields.filter(f => f in u).map(f => [f, u[f as keyof typeof u]])
)
)
: users;
return HttpResponse.json({
data: projected,
meta: { page, pageSize: PAGE_SIZE, total: 500 }
});
}),
http.get('/api/v1/users/:id', ({ params, request }) => {
const auth = request.headers.get('Authorization');
if (!auth?.startsWith('Bearer ')) {
return HttpResponse.json({ error: 'Unauthorized' }, { status: 401 });
}
const id = Array.isArray(params.id) ? params.id[0] : params.id;
const numericId = parseInt(id, 10);
if (isNaN(numericId) || numericId < 1 || numericId > 500) {
return HttpResponse.json({ error: 'Not found' }, { status: 404 });
}
return HttpResponse.json(generateUser(numericId));
})
];
CI integration surface: register the handlers in your test setup file (vitest.setup.ts or jest.setup.ts) and start the worker before the suite runs. In dockerized mock environments, the same handlers can be served from a Node process inside a sidecar container, making them available to every service in docker-compose.yml — not just the single frontend under test.
# docker-compose.mock.yml
services:
mock-api:
build:
context: .
dockerfile: Dockerfile.mock
environment:
- MOCK_SEED=${MOCK_SEED:-42}
- NETWORK_PROFILE=${NETWORK_PROFILE:-local}
ports:
- "3001:3001"
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:3001/health"]
interval: 10s
timeout: 5s
retries: 3
This aligns with mock lifecycle management principles: the mock service has a well-defined start, health-check, and teardown surface that CI can orchestrate reliably.
Core concept 4 — Network realism and operational concerns
Data realism extends beyond payload structure to delivery characteristics. Applications must handle latency spikes, partial failures, and bandwidth constraints. Integrating network condition simulation into local mock layers ensures frontend and mobile clients are tested against realistic transport-layer behaviour before those conditions appear in production.
// mocks/middleware/throttle.ts
import { delay } from 'msw';
type NetworkProfile = 'local' | 'staging' | 'poor_network' | 'offline';
const LATENCY_PROFILES: Record<NetworkProfile, { min: number; max: number }> = {
local: { min: 50, max: 200 },
staging: { min: 300, max: 800 },
poor_network: { min: 1000, max: 3000 },
offline: { min: 5000, max: 5000 }
};
const profile =
LATENCY_PROFILES[(process.env.NETWORK_PROFILE as NetworkProfile) ?? 'local'];
/** Call at the top of an MSW handler to apply profile-matched network delay. */
export async function applyNetworkDelay(): Promise<void> {
const jitter =
Math.floor(Math.random() * (profile.max - profile.min)) + profile.min;
await delay(jitter);
}
/** 5% random failure gate — toggled by SIMULATE_FAILURES=true */
export function shouldSimulateFailure(): boolean {
return process.env.SIMULATE_FAILURES === 'true' && Math.random() < 0.05;
}
Apply both utilities inside any handler that needs transport realism:
// mocks/handlers/products.ts
import { http, HttpResponse } from 'msw';
import { applyNetworkDelay, shouldSimulateFailure } from '../middleware/throttle';
export const productHandlers = [
http.get('/api/v1/products', async () => {
await applyNetworkDelay();
if (shouldSimulateFailure()) {
return HttpResponse.json(
{ error: 'Service temporarily unavailable' },
{ status: 503 }
);
}
return HttpResponse.json([
{ id: '1', name: 'Widget Pro', price: 49.99, stock: 142 },
{ id: '2', name: 'Gadget Plus', price: 89.99, stock: 0 }
]);
})
];
Configure the network profile at the environment boundary:
# .env.local
NETWORK_PROFILE=local
SIMULATE_FAILURES=false
# GitHub Actions — degraded-network integration job
env:
NETWORK_PROFILE: poor_network
SIMULATE_FAILURES: "true"
Operational health check: expose a GET /health endpoint in your mock server that returns { status: "ok", seed: MOCK_SEED, profile: NETWORK_PROFILE }. Docker Compose healthcheck and CI wait-for scripts can gate dependent services on this endpoint, preventing race conditions at startup.
Decision guide — Choosing a data generation approach
Use this matrix when deciding how to generate mock data for a given scenario:
| Scenario | Recommended approach | Why |
|---|---|---|
| New API surface, no backend yet | Schema-driven generation from OpenAPI draft | Stays aligned as the spec evolves |
| Regression test suite requiring snapshot stability | Deterministic seed + checked-in generated fixtures | Tests are reproducible without a running generator |
| Auth, pagination, or field projection logic | Dynamic MSW handler with conditional branching | Fixtures cannot express request-context dependencies |
| Mobile app under poor connectivity | Network delay middleware (NETWORK_PROFILE=poor_network) |
Exercises timeout handling, retry logic, and skeleton states |
| Multi-service integration test in CI | Docker Compose mock sidecar with health checks | Shared across all consumers; no per-service MSW setup |
| Contract drift detection | AJV CI gate against spec-exported JSON Schema | Catches mismatches before they reach staging |
When a scenario spans multiple rows — for example, a mobile integration test with auth and poor connectivity — layer the approaches: start with schema-driven fixtures, serve them through a dynamic handler that validates auth headers, and wrap the handler with network delay middleware.
Common failure modes and mitigations
1. Fixtures pass CI but fail staging because the spec was not regenerated after a field was renamed.
Mitigation: run generate:mocks as a step in the CI job that also runs tsc or OpenAPI linting. If the spec hash changes but fixtures are not regenerated, fail the build.
2. Seed is set in faker.seed() inside a factory function, resetting the sequence on every call.
Mitigation: call faker.seed() exactly once at module load time. Audit with a lint rule that disallows faker.seed inside non-setup files.
3. The MSW worker fails silently in test environments, so all requests fall through to the real network.
Mitigation: enable onUnhandledRequest: 'error' in the worker setup so unmatched requests fail tests loudly rather than making real HTTP calls. This is covered in depth under advanced MSW handler patterns.
4. Network delay middleware is not reset between test cases, causing timeout flakiness.
Mitigation: read process.env.NETWORK_PROFILE at handler invocation time, not at module load time. This allows per-test environment overrides without module cache invalidation.
5. Generated fixtures exceed MSW’s in-memory response size limit for large list endpoints.
Mitigation: cap generated list sizes at 50 items in development (use a MAX_MOCK_ITEMS env var) and use cursor-based pagination responses so tests do not load the full dataset. The response shaping techniques section covers how to structure paginated responses correctly.
6. Mock server port conflicts between local dev and CI parallel jobs.
Mitigation: randomise the port via PORT=$(shuf -i 3000-4000 -n1) or use Docker’s expose without a fixed host port, letting the orchestrator assign one. Pass the resolved URL to consumers via an env var.
FAQ
What is the difference between static fixtures and schema-driven mock data?
Static fixtures are hand-authored JSON files committed to the repository. Schema-driven generation synthesises payloads from an OpenAPI or JSON Schema definition at build time, so every generated fixture reflects the current contract. When a field is renamed or a new required property is added, re-running the generator surfaces the gap immediately rather than waiting for a test to fail in staging.
Why do flaky tests often trace back to mock data randomness?
When @faker-js/faker is not seeded, each test run generates different values. A field that is sometimes null, sometimes a 300-character string, or sometimes a boundary integer causes assertions to pass intermittently. The failure is non-deterministic — it cannot be reproduced from the CI log alone. Seeding eliminates the variable.
How should the CI seed differ from the local developer seed?
Locally, a fixed seed (e.g. MOCK_SEED=42) is fastest to debug because the data is always the same. In CI, set the seed to the pipeline run ID (${{ github.run_id }} in GitHub Actions). This makes each run unique but fully reproducible: a developer can copy the run ID from a failed build log, set it as MOCK_SEED locally, and reproduce the exact dataset that broke the test.
When does network simulation belong in handlers versus a separate proxy layer?
Inline delay() middleware is appropriate when a single frontend project needs to test its own loading states and error boundaries. When multiple services share a mock stack — as in a dockerized mock environment — centralise throttle configuration in a proxy (WireMock, a local API gateway) so every consumer inherits the same transport profile without duplicating middleware. The proxy vs inline mocking strategies comparison covers this decision in detail.
How do I prevent mock data from drifting out of sync with the real API?
Run an AJV validation script against all generated fixture files on every CI build. Export your OpenAPI spec as JSON Schema, hash the spec in the CI job, and fail the build when fixtures were not regenerated after a spec change. Additionally, integrate mock lifecycle management practices so that generated fixtures are treated as derived artefacts — never edited by hand — and regenerated from the canonical spec on every spec bump.
Related
- Schema-Driven Data Generation — generating type-safe fixtures from OpenAPI and JSON Schema definitions
- Deterministic Seed Management — locking randomisation sequences for reproducible CI and local debugging
- Advanced MSW Handler Patterns — dynamic handlers, conditional branching, and request-body parsing
- Response Shaping Techniques — structuring paginated, filtered, and error responses
- API Mocking Fundamentals & Architecture — the broader interception, lifecycle, and proxy architecture this data layer feeds into
← Back to Home