Testing Strategy

Principles

TDD (Red-Green-Refactor): Write the failing test first, implement minimal code, refactor.
Property-based testing: Use fast-check to verify invariants over generated inputs. Tests define equivalence classes — partitions of the input space where the system must behave uniformly — rather than enumerating individual examples.
Unknown = error: The compiler never silently succeeds when it cannot verify. See ADR-010.

Test Categories

Parser Tests (`test/parsing/`)

Verify that .ddd source parses into the expected AST. Use the parseValid helper and ddd tagged template from test/parsing/helpers.ts.

Validation Tests (`test/validation/`)

Two kinds per validation rule:

Integration tests (.test.ts) — Parse .ddd source via Langium’s validationHelper, check diagnostics for expected errors/warnings.
Property-based tests (.property.test.ts) — Build mock AST nodes directly, verify validation invariants with fast-check arbitraries.

Shared helpers: validate, ddd, expectError, expectWarning, expectNoIssues, expectErrorCount from test/validation/helpers.ts.

Code Generation Tests (`packages/generator-emmett/test/`)

Tests for the Emmett code generator verify that .ddd AST nodes produce correct TypeScript output.

Mock AST pattern: Test fixtures build plain JavaScript objects with $type discriminators and { ref: ... } cross-references, typed as any. This avoids coupling tests to the Langium runtime while preserving the AST shape that generators consume.

Fixture builder pattern: Complex decider fixtures (e.g., buildRegistrationDecider()) construct a full decider AST with commands, events, states, decisions, and evolutions. Each fixture exercises a specific code generation path.

Property tests: test/generators/codegen.property.test.ts uses fast-check arbitraries from test/arbitraries/codegen.ts to generate random CodegenDeciderSpec values and verify structural invariants (e.g., generated output contains the correct event type names, state type names, and function signatures).

Assertion layering: Tests combine behavioral assertions (toContain, toEqual) with snapshot assertions (toMatchSnapshot) for defense in depth. See Assertion Strength Guidance below.

Snapshot Testing

Snapshot tests capture the exact output of generators as regression guards. When a code change modifies generated output — even whitespace or punctuation — the snapshot comparison fails, forcing explicit review.

Tool: Bun’s built-in snapshot API (documentation).

APIs:

API	Storage	Use Case
`toMatchSnapshot()`	`__snapshots__/<test-file>.snap` (colocated)	Default — serializes to external file
`toMatchInlineSnapshot()`	Inline in test source (auto-populated on first run)	Small values where inline is clearer
`toThrowErrorMatchingSnapshot()`	External `.snap` file	Error message regression

Snapshot file format: Files use // Bun Snapshot v1 header with exports["test name 1"] = \”value”`;` entries. Test names become snapshot keys.

Workflow:

Add expect(result).toMatchSnapshot() — first run auto-creates the .snap file.
Review git diff __snapshots__/ — every snapshot change must be intentional.
After intentional output changes: bun test --update-snapshots to regenerate.

Git: __snapshots__/ directories are committed to version control. Reviewers see snapshot diffs in PRs.

When to use: After behavioral tests verify correctness, as a safety net for remaining string-level mutations.

When NOT to use: As the sole assertion. Snapshots encode current behavior (including bugs). Pair with behavioral assertions that verify correctness.

Stryker interaction: Snapshot mismatches cause non-zero exit from bun test, so Stryker’s command-runner registers the mutant as killed.

Arbitraries (`test/arbitraries/`)

Custom fast-check arbitraries derived from the Langium grammar via LLM-assisted generation. The grammar defines the shape of valid AST nodes; the arbitraries mirror this structure to produce random but grammatically valid inputs.

Grammar → Arbitrary mapping: Each grammar rule (e.g., Decider, Decision, Evolution) maps to an arbitrary that generates structurally valid mock AST nodes. The LLM reads the grammar and produces arbitraries that respect the rule’s cardinalities, references, and type constraints.

Equivalence classes: Arbitraries partition the input space into classes that exercise distinct code paths:

Class	What It Covers	Example
Complete deciders	All (Command, State) pairs have decide clauses	No exhaustiveness errors
Incomplete deciders	Missing decide clauses for some pairs	Exhaustiveness errors expected
Guarded decisions	Decisions with `require` guards	Guard consistency checks
Unguarded decisions	Short-form decisions without guards	No guard checks needed
Terminal states	States marked as terminal	Terminal state enforcement
Dead declarations	Commands/events declared but unused	Dead code warnings

Key exports:

arbDeciderSpec — generates decider configurations across equivalence classes
buildMockDecider(spec) — converts a spec to a mock Langium AST Decider node
collectErrors() — creates a mock ValidationAcceptor that captures emitted diagnostics

Property-Based Test Design

Each validation rule has a corresponding .property.test.ts file that defines properties over equivalence classes rather than individual test cases.

Structure of a property test:

Define the equivalence class via a constrained arbitrary (e.g., “deciders where all Command × State pairs are covered”)
State the invariant that must hold for all members of the class (e.g., “no exhaustiveness errors are emitted”)
Let fast-check explore the space with random inputs (numRuns: 100)

Example pattern:

test('complete deciders produce no exhaustiveness errors', () => {
  fc.assert(
    fc.property(arbDeciderSpec({ complete: true }), (spec) => {
      const decider = buildMockDecider(spec);
      const { errors } = collectErrors();
      checkExhaustiveness(decider, errors);
      expect(errors()).toHaveLength(0);
    }),
    { numRuns: 100 },
  );
});

This single property replaces dozens of example-based tests by verifying the invariant holds across the entire equivalence class.

Running Tests

# All packages
bun test

# Single package
cd packages/language && bun test

# Single test file
cd packages/language && bun test test/validation/guard-consistency.test.ts

# With filter
bun test --filter "exhaustiveness"

Mutation Testing

# Single package
cd packages/generator-emmett && bun run stryker
cd packages/language && bun run stryker

Mutation Testing

Stryker verifies that the test suite detects code changes — a test suite that passes when code is mutated is not providing real coverage.

Configuration

All packages with tests have Stryker configs (stryker.config.json). Stryker runs in command-runner mode because Bun has no native Stryker plugin — each mutant spawns a full bun test invocation.

Reporters configured: JSON (machine-readable), HTML (visual), clear-text (terminal).

Running Mutation Tests

# Single package
cd packages/generator-emmett && bun run stryker

# Language package
cd packages/language && bun run stryker

CI Integration

The .forgejo/workflows/mutation-testing.yml workflow runs Stryker on push, uploads JSON and HTML reports as artifacts with 30-day retention.

Reading Mutation Reports

The JSON report at reports/mutation/mutation.json contains per-file mutant data. To extract survivor counts:

jq '.files | to_entries[] | {file: .key, survived: [.value.mutants[] | select(.status == "Survived")] | length}' reports/mutation/mutation.json

The HTML report provides a navigable view — open reports/mutation/mutation.html in a browser to see highlighted source with mutant status overlays.

Assertion Strength Guidance

Mutation testing reveals that assertion choice directly affects mutant kill rate. The following table orders assertions by their effectiveness against Stryker’s mutators:

Assertion	Mutant Kill Rate	Maintenance Cost	Use For
`toContain(token)`	~50%	Low	Property tests, behavioral checks where exact output varies
Line-split + `toContain`	~70%	Medium	Isolating assertions to specific output regions
`toEqual(exactString)`	~100%	High (brittle to formatting)	Small static generators with fixed output
`toMatchSnapshot()`	~100%	Medium (update on intentional change)	Regression guard on all generators

Recommended layering:

Behavioral toContain — verify presence of critical tokens (event names, type tags, keywords).
toEqual for static generators — small generators with fixed templates (e.g., Emmett wiring, type aliases).
toMatchSnapshot — catch remaining string-level mutations (indentation, empty lines, decorative punctuation).

The StringLiteral mutator dominates code generator mutation testing. It replaces template strings with "", which toContain passes because the empty string is contained in any string. toEqual and toMatchSnapshot both catch this mutator class.

Test Conventions

Test files use bun:test (describe, test, expect)
Property tests use fc.assert(fc.property(...), { numRuns: 100 })
Mock AST nodes use as unknown as Type cast pattern (avoids Langium runtime dependency in unit tests)
Integration tests parse real .ddd source for end-to-end validation

Testing Strategy

Principles

Test Categories

Parser Tests (test/parsing/)

Validation Tests (test/validation/)

Code Generation Tests (packages/generator-emmett/test/)

Snapshot Testing

Arbitraries (test/arbitraries/)

Property-Based Test Design

Running Tests

Mutation Testing

Mutation Testing

Configuration

Running Mutation Tests

CI Integration

Reading Mutation Reports

Assertion Strength Guidance

Test Conventions

Parser Tests (`test/parsing/`)

Validation Tests (`test/validation/`)

Code Generation Tests (`packages/generator-emmett/test/`)

Arbitraries (`test/arbitraries/`)