Schema-Driven Data Generation

This page covers generating structurally valid mock payloads directly from OpenAPI 3.x and JSON Schema contract files — it does not cover manually authored fixture files or stateful scenario sequencing.

Prerequisites

  • Node.js 18+ or Python 3.10+ installed
  • An OpenAPI 3.x YAML/JSON spec (or JSON Schema Draft 2020-12 document) committed to your repository
  • A mock server capable of serving generated responses (WireMock, MSW, or a lightweight Express stub)
  • npm or pip available in your CI environment
  • Familiarity with the request interception patterns that sit between generated payloads and your application code

Schema-Driven Generation Pipeline Three-stage pipeline showing how an OpenAPI or JSON Schema source file is parsed and constraint-resolved before values are synthesised into a validated JSON payload ready for mock server consumption. Schema Source OpenAPI 3.x JSON Schema GraphQL SDL Constraint Resolver type / format enum / pattern allOf / oneOf Value Synthesiser seeded RNG faker providers custom overrides Validated Payload JSON / YAML mock-ready Deterministic seed pinned per schema version hash

Phase 1 — Core Setup

Install a schema-aware generator. openapi-sampler is the lowest-friction choice when you already have an OpenAPI YAML file:

npm install --save-dev openapi-sampler @stoplight/spectral-cli openapi-mock-validator

Create scripts/generate-mocks.mjs:

// scripts/generate-mocks.mjs
import { readFileSync, writeFileSync, mkdirSync } from "node:fs";
import { resolve, dirname } from "node:path";
import { fileURLToPath } from "node:url";
import { load as yamlLoad } from "js-yaml";
import { sample } from "openapi-sampler";

const __dir = dirname(fileURLToPath(import.meta.url));
const specPath = resolve(__dir, "../contracts/openapi.yaml");
const outDir = resolve(__dir, "../dist/mocks");

const spec = yamlLoad(readFileSync(specPath, "utf8"));

mkdirSync(outDir, { recursive: true });

// Walk every response schema and write a fixture file per operation
for (const [path, methods] of Object.entries(spec.paths ?? {})) {
  for (const [method, operation] of Object.entries(methods)) {
    const successResponse = operation.responses?.["200"] ?? operation.responses?.["201"];
    const schema = successResponse?.content?.["application/json"]?.schema;
    if (!schema) continue;

    const operationId = operation.operationId ?? `${method}_${path.replace(/\//g, "_")}`;
    const payload = sample(schema, { skipNonRequired: false, quiet: true }, spec);
    const dest = resolve(outDir, `${operationId}.json`);
    writeFileSync(dest, JSON.stringify(payload, null, 2), "utf8");
    console.log(`Generated: ${dest}`);
  }
}

Run it:

node scripts/generate-mocks.mjs

Expected output — one .json file per operation:

Generated: /dist/mocks/getUser.json
Generated: /dist/mocks/listOrders.json
Generated: /dist/mocks/createPayment.json

If you prefer Zod-based type definitions over a standalone YAML spec, replace openapi-sampler with @anatine/zod-mock:

// src/mocks/generateFromSchema.ts
import { generateMock } from "@anatine/zod-mock";
import { z } from "zod";

const UserSchema = z.object({
  id: z.string().uuid(),
  email: z.string().email(),
  role: z.enum(["admin", "editor", "viewer"]),
  createdAt: z.string().datetime(),
  profile: z.object({
    displayName: z.string().min(1).max(80),
    avatarUrl: z.string().url().optional(),
  }),
});

export type User = z.infer<typeof UserSchema>;

export function generateUser(): User {
  return generateMock(UserSchema);
}

The Zod approach keeps your mock schema in sync with your TypeScript types automatically, eliminating drift between the type layer and the mock layer.

Phase 2 — Configuration and Wiring

Generator config file

Create mock-generator.config.yaml at the project root to centralise generator behaviour:

# mock-generator.config.yaml
pipeline:
  schema_source: "./contracts/openapi.yaml"
  seed_strategy: "version_hash"     # pins the RNG to a hash of the spec content
  constraint_resolution:
    one_of_policy: "first_match"    # deterministic; avoids exhaustive enumeration
    circular_ref_max_depth: 3       # terminates recursion before stack overflow
    regex_fallback: "uuid_v4"       # used when a pattern is too complex to solve
  custom_overrides:
    x-mock-rules: true              # honour OpenAPI extension fields for custom values
output:
  format: "json"
  directory: "./dist/mocks"
  minify: false
validation:
  strict: true
  fail_on_drift: true

Seeding for reproducibility

To guarantee reproducible test environments, integrate deterministic seed management before any value synthesiser runs:

// src/mocks/seedRng.ts
import { createHash } from "node:crypto";
import { readFileSync } from "node:fs";
import seedrandom from "seedrandom";

/**
 * Derive a stable seed from the content hash of the OpenAPI spec.
 * Identical spec content → identical mock output across all environments.
 */
export function seedFromSpec(specPath: string): () => number {
  const content = readFileSync(specPath, "utf8");
  const hash = createHash("sha256").update(content).digest("hex").slice(0, 12);
  const rng = seedrandom(hash);
  return rng;
}

Pass the seeded RNG into your Faker instance (if you use Faker for value synthesis on top of schema constraints):

// src/mocks/fakerSeed.ts
import { faker } from "@faker-js/faker";
import { seedFromSpec } from "./seedRng.js";

const rng = seedFromSpec("./contracts/openapi.yaml");

// Faker accepts a numeric seed; extract one from the hashed RNG
faker.seed(Math.floor(rng() * 2 ** 31));

MSW handler wiring

If you use MSW handler registration as your interception layer, wire the generated payloads into handlers rather than hard-coding values:

// src/mocks/handlers.ts
import { http, HttpResponse } from "msw";
import { generateUser } from "./generateFromSchema.js";

export const handlers = [
  http.get("/api/users/:id", ({ params }) => {
    const user = generateUser();
    // Respect the route param so fixtures feel realistic
    return HttpResponse.json({ ...user, id: params.id as string });
  }),

  http.get("/api/users", () => {
    const users = Array.from({ length: 10 }, generateUser);
    return HttpResponse.json({ items: users, total: users.length });
  }),
];

Docker service definition

For teams that centralise mock serving in a dockerized mock environment, add a generation step as a pre-start command:

# docker-compose.yml (mock service excerpt)
services:
  mock-api:
    image: node:20-alpine
    working_dir: /app
    volumes:
      - .:/app
    command: >
      sh -c "node scripts/generate-mocks.mjs
             && npx serve dist/mocks -p 3001 --single"
    ports:
      - "3001:3001"
    environment:
      NODE_ENV: test
      SCHEMA_REF: "${GITHUB_SHA:-local}"
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:3001/getUser.json"]
      interval: 5s
      timeout: 3s
      retries: 5

Phase 3 — CI Pipeline Integration

Pin schema generation as a CI gate so payload drift is caught before any integration test runs. The response shaping techniques your frontend depends on must remain aligned with the published spec — this step enforces that boundary.

# .github/workflows/e2e-mocks.yml
name: E2E with schema-generated mocks

on:
  push:
    paths:
      - "contracts/**"
      - "src/mocks/**"
      - "scripts/generate-mocks.mjs"

jobs:
  generate-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: "20"
          cache: "npm"

      - name: Install dependencies
        run: npm ci

      - name: Lint OpenAPI spec
        run: npx @stoplight/spectral-cli lint ./contracts/openapi.yaml --ruleset .spectral.yaml

      - name: Generate mock fixtures
        run: node scripts/generate-mocks.mjs

      - name: Validate fixtures against contract
        run: |
          npx openapi-mock-validator \
            --schema ./contracts/openapi.yaml \
            --mock-output ./dist/mocks/ \
            --strict \
            --fail-on-drift

      - name: Cache generated fixtures
        uses: actions/cache@v4
        with:
          path: dist/mocks
          key: mocks-${{ hashFiles('contracts/openapi.yaml') }}

      - name: Start mock server and run E2E tests
        run: |
          npx serve dist/mocks -p 3001 &
          sleep 2
          npm run test:e2e -- --base-url http://localhost:3001

Cache the generated fixture map. This prevents non-deterministic payload generation from causing flaky integration tests. The cache key (hashFiles('contracts/openapi.yaml')) means the cache is invalidated exactly when the spec changes — not on every commit.

To align with mock lifecycle management best practices, tear down the mock server process explicitly after the test step rather than letting the job runner kill it implicitly.

Verification Steps

  • Run node scripts/generate-mocks.mjs and confirm one .json file appears in dist/mocks/ for every operationId in your spec.
  • Run npx openapi-mock-validator --schema ./contracts/openapi.yaml --mock-output ./dist/mocks/ --strict and verify exit code 0 with no drift errors.
  • Fetch a generated file directly and confirm all required fields are present and correctly typed: node -e "const f = require('./dist/mocks/getUser.json'); console.log(JSON.stringify(f, null, 2))".
  • In a browser dev session with MSW active, open the Network panel and confirm the handler returns generated — not hard-coded — data for GET /api/users/123.
  • Run your full E2E suite once with SCHEMA_REF=$(git rev-parse HEAD) set, then again with the same value. Confirm the generated payloads are byte-for-byte identical between both runs.

Troubleshooting

Error: Cannot resolve $ref: '#/components/schemas/UnknownModel'

The generator cannot find a schema component the spec references. This is usually a broken $ref introduced during a manual spec edit.

Fix: Run npx @stoplight/spectral-cli lint ./contracts/openapi.yaml before generation. Look for unresolved-ref errors. Either add the missing component under components.schemas or correct the reference path. The linter must pass before the generator runs.

TypeError: schema.type is undefined when using openapi-sampler

openapi-sampler expects a fully dereferenced schema object. If you pass a raw spec with $ref pointers, it cannot resolve types.

Fix: Dereference the spec before sampling:

import { dereference } from "@apidevtools/json-schema-ref-parser";

const derefed = await dereference("./contracts/openapi.yaml");
const payload = sample(derefed.paths["/users"].get.responses["200"].content["application/json"].schema);

Generated values fail pattern constraints (regex mismatch)

When a schema field uses a complex regex (e.g., a proprietary ID format), the generator may fall back to a generic string that does not match the pattern.

Fix: Add an x-mock-example extension field on the problematic property in your OpenAPI spec:

components:
  schemas:
    Order:
      properties:
        referenceCode:
          type: string
          pattern: "^ORD-[0-9]{6}-[A-Z]{3}$"
          x-mock-example: "ORD-001234-ABC"   # generator honours this before trying to satisfy the regex

Alternatively, set regex_fallback: "x-mock-example" in mock-generator.config.yaml to enforce this policy globally.

oneOf produces inconsistent shapes across runs

Without a deterministic one_of_policy, the generator may pick different branches on each run, causing snapshots to differ.

Fix: Set one_of_policy: "first_match" in mock-generator.config.yaml. If you need coverage of all branches, generate one fixture per branch explicitly by iterating the oneOf array in your generation script.

Circular reference causes RangeError: Maximum call stack size exceeded

A deeply nested schema (e.g., a TreeNode that references itself) causes infinite recursion in the generator.

Fix: Set circular_ref_max_depth: 2 in your config. For schemas that genuinely need deep nesting in production, override the circular path with a hand-authored fixture file that the generator does not overwrite:

// In generate-mocks.mjs, skip operations with known circular schemas
const CIRCULAR_SKIP = new Set(["getTreeNode", "getNestedCategory"]);
if (CIRCULAR_SKIP.has(operationId)) {
  console.log(`Skipping circular schema: ${operationId}`);
  continue;
}

When to Advance

You are ready to move on from this setup when:

  • Every operation in your OpenAPI spec produces a valid, schema-conformant fixture without manual intervention.
  • The openapi-mock-validator --strict step passes in CI with zero warnings.
  • Re-running generation twice from the same SCHEMA_REF produces identical output (confirming seed stability via deterministic seed management).
  • A frontend developer can start the mock server and get realistic, type-correct data for every endpoint without touching fixture files.
  • Any spec change automatically invalidates the fixture cache and triggers regeneration in CI.

FAQ

Can I use schema-driven generation with GraphQL instead of OpenAPI?

Yes. Tools like graphql-faker and @graphql-tools/mock parse SDL type definitions and generate field values that satisfy type constraints. The generator pipeline is identical — swap the schema source from an OpenAPI YAML file to a .graphql SDL file and configure a GraphQL-aware resolver. The seeding and validation steps remain the same.

How do I handle circular references in my schema?

Set circular_ref_max_depth in your generator config (typically 2-3 levels) so the engine terminates recursion rather than looping. For deeply nested models, prefer an explicit $ref to a simplified version of the object rather than relying on the depth limiter alone. Add a hand-authored fixture file for any operation that genuinely requires deep nesting.

Do generated payloads survive schema upgrades automatically?

Only if you re-run generation after every spec change. Pin schema generation to a CI step triggered by changes to your contract file, and use a version-derived seed so that payload contents shift deterministically when the spec changes. Pair this with snapshot tests to distinguish intentional payload evolution from accidental regression.

What is the difference between openapi-sampler and @anatine/zod-mock?

openapi-sampler reads OpenAPI documents directly and uses example hints from x-example fields. @anatine/zod-mock generates from Zod schemas at runtime, which is preferable when your codebase already defines types in Zod and you want the mock to stay in sync with the TypeScript type layer without a separate contract file. Use openapi-sampler when the spec is the source of truth; use @anatine/zod-mock when TypeScript types are the source of truth.


← Back to Data Generation & Realism Strategies