Schema-Driven Data Generation
This page covers generating structurally valid mock payloads directly from OpenAPI 3.x and JSON Schema contract files — it does not cover manually authored fixture files or stateful scenario sequencing.
Prerequisites
- Node.js 18+ or Python 3.10+ installed
- An OpenAPI 3.x YAML/JSON spec (or JSON Schema Draft 2020-12 document) committed to your repository
- A mock server capable of serving generated responses (WireMock, MSW, or a lightweight Express stub)
-
npmorpipavailable in your CI environment - Familiarity with the request interception patterns that sit between generated payloads and your application code
Phase 1 — Core Setup
Install a schema-aware generator. openapi-sampler is the lowest-friction choice when you already have an OpenAPI YAML file:
npm install --save-dev openapi-sampler @stoplight/spectral-cli openapi-mock-validator
Create scripts/generate-mocks.mjs:
// scripts/generate-mocks.mjs
import { readFileSync, writeFileSync, mkdirSync } from "node:fs";
import { resolve, dirname } from "node:path";
import { fileURLToPath } from "node:url";
import { load as yamlLoad } from "js-yaml";
import { sample } from "openapi-sampler";
const __dir = dirname(fileURLToPath(import.meta.url));
const specPath = resolve(__dir, "../contracts/openapi.yaml");
const outDir = resolve(__dir, "../dist/mocks");
const spec = yamlLoad(readFileSync(specPath, "utf8"));
mkdirSync(outDir, { recursive: true });
// Walk every response schema and write a fixture file per operation
for (const [path, methods] of Object.entries(spec.paths ?? {})) {
for (const [method, operation] of Object.entries(methods)) {
const successResponse = operation.responses?.["200"] ?? operation.responses?.["201"];
const schema = successResponse?.content?.["application/json"]?.schema;
if (!schema) continue;
const operationId = operation.operationId ?? `${method}_${path.replace(/\//g, "_")}`;
const payload = sample(schema, { skipNonRequired: false, quiet: true }, spec);
const dest = resolve(outDir, `${operationId}.json`);
writeFileSync(dest, JSON.stringify(payload, null, 2), "utf8");
console.log(`Generated: ${dest}`);
}
}
Run it:
node scripts/generate-mocks.mjs
Expected output — one .json file per operation:
Generated: /dist/mocks/getUser.json
Generated: /dist/mocks/listOrders.json
Generated: /dist/mocks/createPayment.json
If you prefer Zod-based type definitions over a standalone YAML spec, replace openapi-sampler with @anatine/zod-mock:
// src/mocks/generateFromSchema.ts
import { generateMock } from "@anatine/zod-mock";
import { z } from "zod";
const UserSchema = z.object({
id: z.string().uuid(),
email: z.string().email(),
role: z.enum(["admin", "editor", "viewer"]),
createdAt: z.string().datetime(),
profile: z.object({
displayName: z.string().min(1).max(80),
avatarUrl: z.string().url().optional(),
}),
});
export type User = z.infer<typeof UserSchema>;
export function generateUser(): User {
return generateMock(UserSchema);
}
The Zod approach keeps your mock schema in sync with your TypeScript types automatically, eliminating drift between the type layer and the mock layer.
Phase 2 — Configuration and Wiring
Generator config file
Create mock-generator.config.yaml at the project root to centralise generator behaviour:
# mock-generator.config.yaml
pipeline:
schema_source: "./contracts/openapi.yaml"
seed_strategy: "version_hash" # pins the RNG to a hash of the spec content
constraint_resolution:
one_of_policy: "first_match" # deterministic; avoids exhaustive enumeration
circular_ref_max_depth: 3 # terminates recursion before stack overflow
regex_fallback: "uuid_v4" # used when a pattern is too complex to solve
custom_overrides:
x-mock-rules: true # honour OpenAPI extension fields for custom values
output:
format: "json"
directory: "./dist/mocks"
minify: false
validation:
strict: true
fail_on_drift: true
Seeding for reproducibility
To guarantee reproducible test environments, integrate deterministic seed management before any value synthesiser runs:
// src/mocks/seedRng.ts
import { createHash } from "node:crypto";
import { readFileSync } from "node:fs";
import seedrandom from "seedrandom";
/**
* Derive a stable seed from the content hash of the OpenAPI spec.
* Identical spec content → identical mock output across all environments.
*/
export function seedFromSpec(specPath: string): () => number {
const content = readFileSync(specPath, "utf8");
const hash = createHash("sha256").update(content).digest("hex").slice(0, 12);
const rng = seedrandom(hash);
return rng;
}
Pass the seeded RNG into your Faker instance (if you use Faker for value synthesis on top of schema constraints):
// src/mocks/fakerSeed.ts
import { faker } from "@faker-js/faker";
import { seedFromSpec } from "./seedRng.js";
const rng = seedFromSpec("./contracts/openapi.yaml");
// Faker accepts a numeric seed; extract one from the hashed RNG
faker.seed(Math.floor(rng() * 2 ** 31));
MSW handler wiring
If you use MSW handler registration as your interception layer, wire the generated payloads into handlers rather than hard-coding values:
// src/mocks/handlers.ts
import { http, HttpResponse } from "msw";
import { generateUser } from "./generateFromSchema.js";
export const handlers = [
http.get("/api/users/:id", ({ params }) => {
const user = generateUser();
// Respect the route param so fixtures feel realistic
return HttpResponse.json({ ...user, id: params.id as string });
}),
http.get("/api/users", () => {
const users = Array.from({ length: 10 }, generateUser);
return HttpResponse.json({ items: users, total: users.length });
}),
];
Docker service definition
For teams that centralise mock serving in a dockerized mock environment, add a generation step as a pre-start command:
# docker-compose.yml (mock service excerpt)
services:
mock-api:
image: node:20-alpine
working_dir: /app
volumes:
- .:/app
command: >
sh -c "node scripts/generate-mocks.mjs
&& npx serve dist/mocks -p 3001 --single"
ports:
- "3001:3001"
environment:
NODE_ENV: test
SCHEMA_REF: "${GITHUB_SHA:-local}"
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:3001/getUser.json"]
interval: 5s
timeout: 3s
retries: 5
Phase 3 — CI Pipeline Integration
Pin schema generation as a CI gate so payload drift is caught before any integration test runs. The response shaping techniques your frontend depends on must remain aligned with the published spec — this step enforces that boundary.
# .github/workflows/e2e-mocks.yml
name: E2E with schema-generated mocks
on:
push:
paths:
- "contracts/**"
- "src/mocks/**"
- "scripts/generate-mocks.mjs"
jobs:
generate-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Lint OpenAPI spec
run: npx @stoplight/spectral-cli lint ./contracts/openapi.yaml --ruleset .spectral.yaml
- name: Generate mock fixtures
run: node scripts/generate-mocks.mjs
- name: Validate fixtures against contract
run: |
npx openapi-mock-validator \
--schema ./contracts/openapi.yaml \
--mock-output ./dist/mocks/ \
--strict \
--fail-on-drift
- name: Cache generated fixtures
uses: actions/cache@v4
with:
path: dist/mocks
key: mocks-${{ hashFiles('contracts/openapi.yaml') }}
- name: Start mock server and run E2E tests
run: |
npx serve dist/mocks -p 3001 &
sleep 2
npm run test:e2e -- --base-url http://localhost:3001
Cache the generated fixture map. This prevents non-deterministic payload generation from causing flaky integration tests. The cache key (hashFiles('contracts/openapi.yaml')) means the cache is invalidated exactly when the spec changes — not on every commit.
To align with mock lifecycle management best practices, tear down the mock server process explicitly after the test step rather than letting the job runner kill it implicitly.
Verification Steps
- Run
node scripts/generate-mocks.mjsand confirm one.jsonfile appears indist/mocks/for everyoperationIdin your spec. - Run
npx openapi-mock-validator --schema ./contracts/openapi.yaml --mock-output ./dist/mocks/ --strictand verify exit code 0 with no drift errors. - Fetch a generated file directly and confirm all required fields are present and correctly typed:
node -e "const f = require('./dist/mocks/getUser.json'); console.log(JSON.stringify(f, null, 2))". - In a browser dev session with MSW active, open the Network panel and confirm the handler returns generated — not hard-coded — data for
GET /api/users/123. - Run your full E2E suite once with
SCHEMA_REF=$(git rev-parse HEAD)set, then again with the same value. Confirm the generated payloads are byte-for-byte identical between both runs.
Troubleshooting
Error: Cannot resolve $ref: '#/components/schemas/UnknownModel'
The generator cannot find a schema component the spec references. This is usually a broken $ref introduced during a manual spec edit.
Fix: Run npx @stoplight/spectral-cli lint ./contracts/openapi.yaml before generation. Look for unresolved-ref errors. Either add the missing component under components.schemas or correct the reference path. The linter must pass before the generator runs.
TypeError: schema.type is undefined when using openapi-sampler
openapi-sampler expects a fully dereferenced schema object. If you pass a raw spec with $ref pointers, it cannot resolve types.
Fix: Dereference the spec before sampling:
import { dereference } from "@apidevtools/json-schema-ref-parser";
const derefed = await dereference("./contracts/openapi.yaml");
const payload = sample(derefed.paths["/users"].get.responses["200"].content["application/json"].schema);
Generated values fail pattern constraints (regex mismatch)
When a schema field uses a complex regex (e.g., a proprietary ID format), the generator may fall back to a generic string that does not match the pattern.
Fix: Add an x-mock-example extension field on the problematic property in your OpenAPI spec:
components:
schemas:
Order:
properties:
referenceCode:
type: string
pattern: "^ORD-[0-9]{6}-[A-Z]{3}$"
x-mock-example: "ORD-001234-ABC" # generator honours this before trying to satisfy the regex
Alternatively, set regex_fallback: "x-mock-example" in mock-generator.config.yaml to enforce this policy globally.
oneOf produces inconsistent shapes across runs
Without a deterministic one_of_policy, the generator may pick different branches on each run, causing snapshots to differ.
Fix: Set one_of_policy: "first_match" in mock-generator.config.yaml. If you need coverage of all branches, generate one fixture per branch explicitly by iterating the oneOf array in your generation script.
Circular reference causes RangeError: Maximum call stack size exceeded
A deeply nested schema (e.g., a TreeNode that references itself) causes infinite recursion in the generator.
Fix: Set circular_ref_max_depth: 2 in your config. For schemas that genuinely need deep nesting in production, override the circular path with a hand-authored fixture file that the generator does not overwrite:
// In generate-mocks.mjs, skip operations with known circular schemas
const CIRCULAR_SKIP = new Set(["getTreeNode", "getNestedCategory"]);
if (CIRCULAR_SKIP.has(operationId)) {
console.log(`Skipping circular schema: ${operationId}`);
continue;
}
When to Advance
You are ready to move on from this setup when:
- Every operation in your OpenAPI spec produces a valid, schema-conformant fixture without manual intervention.
- The
openapi-mock-validator --strictstep passes in CI with zero warnings. - Re-running generation twice from the same
SCHEMA_REFproduces identical output (confirming seed stability via deterministic seed management). - A frontend developer can start the mock server and get realistic, type-correct data for every endpoint without touching fixture files.
- Any spec change automatically invalidates the fixture cache and triggers regeneration in CI.
FAQ
Can I use schema-driven generation with GraphQL instead of OpenAPI?
Yes. Tools like graphql-faker and @graphql-tools/mock parse SDL type definitions and generate field values that satisfy type constraints. The generator pipeline is identical — swap the schema source from an OpenAPI YAML file to a .graphql SDL file and configure a GraphQL-aware resolver. The seeding and validation steps remain the same.
How do I handle circular references in my schema?
Set circular_ref_max_depth in your generator config (typically 2-3 levels) so the engine terminates recursion rather than looping. For deeply nested models, prefer an explicit $ref to a simplified version of the object rather than relying on the depth limiter alone. Add a hand-authored fixture file for any operation that genuinely requires deep nesting.
Do generated payloads survive schema upgrades automatically?
Only if you re-run generation after every spec change. Pin schema generation to a CI step triggered by changes to your contract file, and use a version-derived seed so that payload contents shift deterministically when the spec changes. Pair this with snapshot tests to distinguish intentional payload evolution from accidental regression.
What is the difference between openapi-sampler and @anatine/zod-mock?
openapi-sampler reads OpenAPI documents directly and uses example hints from x-example fields. @anatine/zod-mock generates from Zod schemas at runtime, which is preferable when your codebase already defines types in Zod and you want the mock to stay in sync with the TypeScript type layer without a separate contract file. Use openapi-sampler when the spec is the source of truth; use @anatine/zod-mock when TypeScript types are the source of truth.
Related
- Deterministic Seed Management — pin RNG seeds so generated fixtures are reproducible across CI runs and developer machines
- Response Shaping Techniques — control how the mock server transforms generated payloads before returning them to the client
- Advanced MSW Handler Patterns — wire schema-generated fixtures into complex multi-handler MSW setups
- Dockerized Mock Environments — containerise the generation pipeline so every developer gets identical fixtures without local setup
- Request Interception Patterns — understand where in the stack generated payloads are injected and intercepted
← Back to Data Generation & Realism Strategies