Skip to content

Testing Strategy

Principles

  • TDD (Red-Green-Refactor): Write the failing test first, implement minimal code, refactor.
  • Property-based testing: Use fast-check to verify invariants over generated inputs. Tests define equivalence classes — partitions of the input space where the system must behave uniformly — rather than enumerating individual examples.
  • Unknown = error: The compiler never silently succeeds when it cannot verify. See ADR-010.

Test Categories

Parser Tests (test/parsing/)

Verify that .ddd source parses into the expected AST. Use the parseValid helper and ddd tagged template from test/parsing/helpers.ts.

Validation Tests (test/validation/)

Two kinds per validation rule:

  1. Integration tests (.test.ts) — Parse .ddd source via Langium’s validationHelper, check diagnostics for expected errors/warnings.
  2. Property-based tests (.property.test.ts) — Build mock AST nodes directly, verify validation invariants with fast-check arbitraries.

Shared helpers: validate, ddd, expectError, expectWarning, expectNoIssues, expectErrorCount from test/validation/helpers.ts.

Code Generation Tests (packages/generator-emmett/test/)

Tests for the Emmett code generator verify that .ddd AST nodes produce correct TypeScript output.

Mock AST pattern: Test fixtures build plain JavaScript objects with $type discriminators and { ref: ... } cross-references, typed as any. This avoids coupling tests to the Langium runtime while preserving the AST shape that generators consume.

Fixture builder pattern: Complex decider fixtures (e.g., buildRegistrationDecider()) construct a full decider AST with commands, events, states, decisions, and evolutions. Each fixture exercises a specific code generation path.

Property tests: test/generators/codegen.property.test.ts uses fast-check arbitraries from test/arbitraries/codegen.ts to generate random CodegenDeciderSpec values and verify structural invariants (e.g., generated output contains the correct event type names, state type names, and function signatures).

Assertion layering: Tests combine behavioral assertions (toContain, toEqual) with snapshot assertions (toMatchSnapshot) for defense in depth. See Assertion Strength Guidance below.

Snapshot Testing

Snapshot tests capture the exact output of generators as regression guards. When a code change modifies generated output — even whitespace or punctuation — the snapshot comparison fails, forcing explicit review.

Tool: Bun’s built-in snapshot API (documentation).

APIs:

APIStorageUse Case
toMatchSnapshot()__snapshots__/<test-file>.snap (colocated)Default — serializes to external file
toMatchInlineSnapshot()Inline in test source (auto-populated on first run)Small values where inline is clearer
toThrowErrorMatchingSnapshot()External .snap fileError message regression

Snapshot file format: Files use // Bun Snapshot v1 header with exports["test name 1"] = \”value”`;` entries. Test names become snapshot keys.

Workflow:

  1. Add expect(result).toMatchSnapshot() — first run auto-creates the .snap file.
  2. Review git diff __snapshots__/ — every snapshot change must be intentional.
  3. After intentional output changes: bun test --update-snapshots to regenerate.

Git: __snapshots__/ directories are committed to version control. Reviewers see snapshot diffs in PRs.

When to use: After behavioral tests verify correctness, as a safety net for remaining string-level mutations.

When NOT to use: As the sole assertion. Snapshots encode current behavior (including bugs). Pair with behavioral assertions that verify correctness.

Stryker interaction: Snapshot mismatches cause non-zero exit from bun test, so Stryker’s command-runner registers the mutant as killed.

Arbitraries (test/arbitraries/)

Custom fast-check arbitraries derived from the Langium grammar via LLM-assisted generation. The grammar defines the shape of valid AST nodes; the arbitraries mirror this structure to produce random but grammatically valid inputs.

Grammar → Arbitrary mapping: Each grammar rule (e.g., Decider, Decision, Evolution) maps to an arbitrary that generates structurally valid mock AST nodes. The LLM reads the grammar and produces arbitraries that respect the rule’s cardinalities, references, and type constraints.

Equivalence classes: Arbitraries partition the input space into classes that exercise distinct code paths:

ClassWhat It CoversExample
Complete decidersAll (Command, State) pairs have decide clausesNo exhaustiveness errors
Incomplete decidersMissing decide clauses for some pairsExhaustiveness errors expected
Guarded decisionsDecisions with require guardsGuard consistency checks
Unguarded decisionsShort-form decisions without guardsNo guard checks needed
Terminal statesStates marked as terminalTerminal state enforcement
Dead declarationsCommands/events declared but unusedDead code warnings

Key exports:

  • arbDeciderSpec — generates decider configurations across equivalence classes
  • buildMockDecider(spec) — converts a spec to a mock Langium AST Decider node
  • collectErrors() — creates a mock ValidationAcceptor that captures emitted diagnostics

Property-Based Test Design

Each validation rule has a corresponding .property.test.ts file that defines properties over equivalence classes rather than individual test cases.

Structure of a property test:

  1. Define the equivalence class via a constrained arbitrary (e.g., “deciders where all Command × State pairs are covered”)
  2. State the invariant that must hold for all members of the class (e.g., “no exhaustiveness errors are emitted”)
  3. Let fast-check explore the space with random inputs (numRuns: 100)

Example pattern:

test('complete deciders produce no exhaustiveness errors', () => {
fc.assert(
fc.property(arbDeciderSpec({ complete: true }), (spec) => {
const decider = buildMockDecider(spec);
const { errors } = collectErrors();
checkExhaustiveness(decider, errors);
expect(errors()).toHaveLength(0);
}),
{ numRuns: 100 },
);
});

This single property replaces dozens of example-based tests by verifying the invariant holds across the entire equivalence class.

Running Tests

Terminal window
# All packages
bun test
# Single package
cd packages/language && bun test
# Single test file
cd packages/language && bun test test/validation/guard-consistency.test.ts
# With filter
bun test --filter "exhaustiveness"

Mutation Testing

Terminal window
# Single package
cd packages/generator-emmett && bun run stryker
cd packages/language && bun run stryker

Mutation Testing

Stryker verifies that the test suite detects code changes — a test suite that passes when code is mutated is not providing real coverage.

Configuration

All packages with tests have Stryker configs (stryker.config.json). Stryker runs in command-runner mode because Bun has no native Stryker plugin — each mutant spawns a full bun test invocation.

Reporters configured: JSON (machine-readable), HTML (visual), clear-text (terminal).

Running Mutation Tests

Terminal window
# Single package
cd packages/generator-emmett && bun run stryker
# Language package
cd packages/language && bun run stryker

CI Integration

The .forgejo/workflows/mutation-testing.yml workflow runs Stryker on push, uploads JSON and HTML reports as artifacts with 30-day retention.

Reading Mutation Reports

The JSON report at reports/mutation/mutation.json contains per-file mutant data. To extract survivor counts:

Terminal window
jq '.files | to_entries[] | {file: .key, survived: [.value.mutants[] | select(.status == "Survived")] | length}' reports/mutation/mutation.json

The HTML report provides a navigable view — open reports/mutation/mutation.html in a browser to see highlighted source with mutant status overlays.

Assertion Strength Guidance

Mutation testing reveals that assertion choice directly affects mutant kill rate. The following table orders assertions by their effectiveness against Stryker’s mutators:

AssertionMutant Kill RateMaintenance CostUse For
toContain(token)~50%LowProperty tests, behavioral checks where exact output varies
Line-split + toContain~70%MediumIsolating assertions to specific output regions
toEqual(exactString)~100%High (brittle to formatting)Small static generators with fixed output
toMatchSnapshot()~100%Medium (update on intentional change)Regression guard on all generators

Recommended layering:

  1. Behavioral toContain — verify presence of critical tokens (event names, type tags, keywords).
  2. toEqual for static generators — small generators with fixed templates (e.g., Emmett wiring, type aliases).
  3. toMatchSnapshot — catch remaining string-level mutations (indentation, empty lines, decorative punctuation).

The StringLiteral mutator dominates code generator mutation testing. It replaces template strings with "", which toContain passes because the empty string is contained in any string. toEqual and toMatchSnapshot both catch this mutator class.

Test Conventions

  • Test files use bun:test (describe, test, expect)
  • Property tests use fc.assert(fc.property(...), { numRuns: 100 })
  • Mock AST nodes use as unknown as Type cast pattern (avoids Langium runtime dependency in unit tests)
  • Integration tests parse real .ddd source for end-to-end validation