Chapter 9

Generating Documented Dwelling Variants

9.1 Introduction

Chapter 9: Generating Documented Dwelling Variants

9.1 Introduction (this section)
9.2 The Documentation Bottleneck
9.6 The Procedural-Generation Pipeline: Four-Stage Architecture
9.9 Pipeline Design Decisions and Independent Testability
9.12 The Pre-Vetted Library: Definition and Distinction
9.15 Theoretical Warrant: Three Converging Literatures
9.16 Population, Curation, and Future Work
9.19 Prototype: Runtime and the Documentation-Packet Contract
9.22 Predicate Parsing, the Unparsed-Line Discipline, and the End-to-End Run
9.25 Minimum DSR Depth, Reproducibility, and the Procedural Round-Trip
9.27 Conclusion and Handoff to Chapter 10

A stratified specification for representing adaptation-heavy housing under regulatory constraint stands complete at the close of Chapter 8, and it is, in one precise respect, inert. The standardisation schema delivers a serialised, queryable representation of the SDA Design Standard, a governed triplet substrate of clause and figure records with explicit ambiguity profiles.¹ The Governed Kernel Architecture transforms that substrate into a compositional baseline-library of typed modules, each bounded by declared interface contracts of three kinds: semantic, transformation, and verification.² The notation specifies the two-layer grammar in which baseline-library entries are expressed: the RecPol formal core and the PlaniSyn applied grammar through which their transformations are governed.³ The empirical substrate grounds the suite in dimensional, occurrence, topological, and configurational evidence drawn from a screened 745-plan corpus.⁴ What the four artefacts do not yet supply is an executable instrument that consumes those baseline-library entries, expresses them in PlaniSyn, and emits the documentation packets through which the substrate’s regulatory commitments are made visible, queryable, and traceable to a downstream stakeholder. Until such an instrument exists, the suite is a specification: internally consistent, formally proven, empirically calibrated, and operationally hollow.

The generator supplies the instrument. It is the procedural engine that operationalises the architectural commitments of the standardisation schema, the Governed Kernel Architecture, and the notation into a governed authoring-and-documentation pipeline, and with it in place the suite ceases to be a specification of how an adaptation-heavy housing representation could be governed and becomes an instantiated system in which that governance is enacted. We state the relationship deliberately as operationalisation rather than addition: the generator introduces no new architectural claim of its own, but renders inspectable, at the level of workflow, the single claim the suite already carries.

A procedural artefact that ingests the baseline-library entries from Chapter 6 and emits documentation packets through the PlaniSyn grammar of Chapter 7 discharges three obligations that no other artefact in the suite can discharge alone. The first is to close the documentation bottleneck diagnosed in Section 9.2: the structural friction by which compliance-bearing documentation in adaptation-heavy housing is re-authored by hand for each new design instance, version-controlled poorly, and synchronised across stakeholders only by formal handover. The second is to operationalise, at the procedural-instantiation level, the round-trip fidelity evidence under Proposition 3 (manipulability) that Chapter 7, Section 7.20 establishes statically. The third is to instantiate the four critical properties whose absence motivates the thesis (interoperability, transportability, manipulability, and transformability) at the level of workflow, the level at which a stakeholder actually encounters them. The standardisation schema, the Governed Kernel Architecture, and the notation establish these properties at the level of substrate, composition, and notation; the generator establishes them where a practitioner meets them.⁵

The relationship to Proposition 3 warrants one note at the outset, because it is the relationship most easily overstated. The static round-trip-fidelity evidence proper rests on Chapter 7, which exhibits that PlaniSyn-encoded packets admit lossless replay against the RecPol formal core. What Chapter 9 adds is the procedural-instantiation substrate against which that evidence applies in operation: the prototype’s deterministic execution (pure-Python, no network access, no large-language-model calls, no implementation-defined ordering, a full per-run log) supplies the enabling condition under which a competent non-specialist could manipulate a documentation packet and re-emit it through the pipeline. The prototype does not itself demonstrate non-specialist manipulability; full third-party usability evaluation against a non-specialist user population is declared future work in Section 9.27. The claim we make here is the warrant, not the demonstration.

The method is a deliberate adaptation of procedural content generation, the family of techniques developed primarily in computer-game research for the algorithmic creation of artefacts under declared constraints, to a setting outside that literature’s home domain. Two commitments inherited from that tradition are load-bearing: the discipline of separating the generator from the content space it explores, so that what the generator produces is bounded by an explicit declarative specification rather than implicit programmer intent; and the discipline of treating the constraint set as a first-class artefact whose composition determines what the generator can and cannot emit.⁶ The adaptation is substantive rather than cosmetic. Procedural generation in the games domain optimises for variety, novelty, or aesthetic richness within a generator’s expressive range; the present application optimises instead for traceability, consistency, and compositional accountability against a declared regulatory substrate. The generator’s expressive range is here a property to be bounded, not maximised, bounded so that no emitted documentation packet can violate its baseline-library origin’s interface obligations, which aligns the work with a smaller body of research extending generative methods into design and engineering domains where compliance-bearing accountability is non-negotiable.⁷ Where the cost of an incorrect emission in a game is bounded by player experience, here it is borne by the regulatory commitments of an inhabited dwelling.

Five further sections develop the argument. Section 9.2 diagnoses the documentation bottleneck and surveys the partial responses developed in adjacent literatures. Section 9.6 specifies the procedural-generation pipeline as a sequence of governed transformations from baseline-library entry to emitted packet, declaring the controls, the trace artefacts, and the failure modes at each stage. Section 9.12 develops the pre-vetted-library concept that supplies the pipeline’s substrate, distinguishing it from a conventional template library by the governance conditions under which entries are admitted, retained, and revised. Section 9.19 reports the prototype through which the generator’s auditable behaviour is exhibited and its limitations declared. Section 9.27 consolidates the contributions, restates the propositions to which the generator supplies evidence, and defines the handoff to Chapter 10, in which the integrated suite is subjected to its end-to-end demonstration. The contribution is complete when the procedural artefact is shown to ingest baseline-library entries that respect Chapter 6’s interface contracts, to emit packets that respect Chapter 5’s schema commitments and Chapter 7’s notation grammar, and to do so under traceable, auditable, and reproducible conditions. On that basis, Section 9.2 now states the bottleneck whose discharge is the chapter’s organising obligation.

9.2 The Documentation Bottleneck

Chapter 9: Generating Documented Dwelling Variants

9.1 Introduction
9.2 The Documentation Bottleneck (this section)
9.6 The Procedural-Generation Pipeline: Four-Stage Architecture
9.9 Pipeline Design Decisions and Independent Testability
9.12 The Pre-Vetted Library: Definition and Distinction
9.15 Theoretical Warrant: Three Converging Literatures
9.16 Population, Curation, and Future Work
9.19 Prototype: Runtime and the Documentation-Packet Contract
9.22 Predicate Parsing, the Unparsed-Line Discipline, and the End-to-End Run
9.25 Minimum DSR Depth, Reproducibility, and the Procedural Round-Trip
9.27 Conclusion and Handoff to Chapter 10

In adaptation-heavy housing under regulatory constraint, the dominant friction in the workflow is the production of complete, consistent, and queryable documentation. The compliance-bearing packet (through which a designed dwelling is shown to satisfy its statutory obligations to occupants, regulators, certifiers, and providers) is authored manually by domain experts, version-controlled poorly, synchronised across stakeholders only at formal handovers, and re-authored in full whenever a clause, a stakeholder commitment, or a built component shifts. Each new design instance triggers a fresh re-authoring cycle whose cost is borne in expert hours, whose consistency is sustained by individual diligence rather than by architecture, and whose traceability to the originating regulatory text is preserved by convention rather than by contract. The cost is therefore not amortised across instances; it scales linearly with the rate of regulatory change and the rate of design variation, which are precisely the rates the adaptation-heavy setting is built to make high.

This friction is not a tooling deficit to be remedied by better authoring software, nor a process deficit to be remedied by better project management. It is a representational deficit: the absence of a governed pipeline through which a pre-vetted compositional substrate can be transformed, under declared rules, into the documentation packet that downstream stakeholders consume. The diagnosis is the chapter’s organising obligation, and the artefact developed across Section 9.6 to Section 9.19 is its discharge.

9.3 An empirical anchor: documentation resists naive automation

The claim that documentation production is a knowledge-intensive translation task rather than a trivial encode-then-emit task rests on the empirical work reported in Chapter 8. The 745-plan corpus that supplies the empirical substrate was extracted under a deliberately controlled protocol designed to test what plan-level information could be reliably encoded into the representation that the standardisation schema specifies.⁸ Two features of that protocol inform the present argument. The extraction was performed agent-manually rather than by end-to-end programmatic automation, a methodological commitment recorded in the corpus’s governance documentation, and the resulting validation pipeline achieved zero parse failures across the 745 plans, a complete schema-validity outcome for the corpus’s structural fields, at a cost of roughly fourteen days of governed agent-manual effort.⁹

Two implications follow. First, the per-plan effort budget was non-trivial even under a workflow whose schema, primitive vocabulary, taxonomy, and verification protocol had already been specified by Chapter 5 and Chapter 6; the complete validity rate indexes specification completeness, not task ease. Second, the activity that consumed the budget was not data entry but knowledge-bearing translation: the resolution of regulatory referents, the disambiguation of the polysemous terms that the standardisation schema records as carrying confidence-graded interpretive uncertainty, and the placement of plan-level evidence into the categorial slots the RecPol and PlaniSyn notation makes available.¹⁰ The anchor therefore licenses the section’s central claim: documentation production in this setting is an interpretation-and-composition problem, not a transcription one. Any pipeline that purports to discharge the bottleneck must support the interpretive activity rather than presuppose its absence. A pipeline that emits by mechanical concatenation of pre-authored fragments would re-encode the bottleneck’s costs into a static template library; a pipeline that emits by free-form generative synthesis would forfeit the traceability that makes the documentation compliance-bearing. The task is to specify the architectural middle position.

9.4 Four families of existing approach

Four families of existing approach address the bottleneck. Each establishes something real, and each addresses a partial slice.

Building-information-modelling-based documentation co-locates design and compliance information within a structured environment that supports model-derived schedule and report generation, and it succeeds at consolidating heterogeneous design information into a single authoritative model and at deriving secondary documents from it.¹¹ It does not, however, supply a governance contract over the regulatory text whose obligations the documentation must discharge. Information-management standards such as ISO 19650 require requirements traceability through delivery as a non-negotiable property of built-environment information,¹² yet neither the standard nor the surveyed toolchains specify how a clause’s interpretive uncertainty is tracked, how compositional rules over plan modules are enforced at the emission boundary, or how a packet is re-emitted, with traceable changes, when an upstream commitment shifts.

Automated rule-checking has developed over fifteen years as a programme to convert regulatory natural language into computable form against which design models can be checked, and it succeeds in framing that conversion as a tractable engineering problem, producing deployed systems, formal taxonomies of rule complexity, and clause-role mark-up methodologies.¹³ It consumes a model and emits a verdict (pass, fail, or referral) over a checked clause, but it does not emit the documentation packet through which the clause’s discharge is communicated to a downstream stakeholder. Treating the verdict as the output equates compliance with verdict-issuance and elides the documentation work that follows it, which is the work that consumes the bulk of the practitioner budget.

Schema-driven document generators, including LegalRuleML and adjacent regulatory-encoding formalisms, provide structured representations of regulatory rules with explicit deontic modalities, and they succeed at producing structured outputs from structured inputs while preserving formal logical relationships across the transformation.¹⁴ The approach assumes that regulatory content can be fully formalised in logic: an assumption the SDA corpus contradicts, given that roughly three in five eligible terms carry measurable polysemy that the schema records as a tracked design object rather than a defect to be eliminated. A pipeline presupposing complete logical formalisation cannot accommodate the schema’s principled transparency mode; it forces premature resolution of the interpretive uncertainty the substrate is designed to carry.

Large-language-model-assisted authoring applies contemporary generative-language models to documentation drafting, regulatory summarisation, and clause-interpretation tasks, and it succeeds in reducing the marginal cost of producing initial drafts and in surfacing patterns across regulatory texts that manual review cannot tractably traverse.¹⁵ It supplies no governance contract on what is emitted. The output of an unconstrained generative model is, by construction, drawn from a distribution whose support extends well beyond the baseline-library entries the pipeline is required to respect, and its traceability to the originating regulatory text is at best a property of the prompt-engineering protocol rather than of the architecture. The cost structure shifts the bottleneck from initial drafting to verification, the labour of confirming that a generated draft is consistent with the commitments it purports to discharge, and verification of generative output is itself a knowledge-intensive activity for which the substrate must already supply the comparator.

9.5 The gap the generator fills

The four families establish, respectively, that design information can be co-located, that regulatory text can be made checkable, that structured transformations can preserve logical relationships, and that marginal drafting cost can be reduced. None of the four, judged against the criterion the chapter has stated, supplies what the bottleneck requires: a procedural pipeline that treats the governed baseline-library entries from Chapter 6 as the substrate rather than presupposing a free-form input distribution; that applies the PlaniSyn grammar from Chapter 7 as the transformation rule-set rather than a hand-crafted template language; and that emits documentation packets which are simultaneously human-readable, machine-queryable, and traceable to the schema commitments of Chapter 5 rather than emitting any one of these at the expense of the others. The judgement is bounded to these four surveyed families and to that three-part criterion; it is not a claim about the literature at large.

The gap is not arbitrary. Each of the three properties follows from a distinct propositional commitment in the thesis. The substrate property follows from Proposition 1 (interoperability): a pipeline that does not consume a governed substrate cannot inherit the substrate’s interface contracts and therefore cannot emit documentation the next stratum can consume without reinterpretation. The transformation-rule property follows from Proposition 4 (transformability): a pipeline that does not apply a declared grammar cannot guarantee that the differences between two emitted packets are themselves interpretable, which is the operational form in which transformability under regulatory and occupant change is tested. The triple-readability property follows jointly from Proposition 2 (transportability) and Proposition 3 (manipulability): a packet lacking any one of the three readabilities cannot survive the phase transitions across which transportability is tested, nor can it be manipulated by a competent non-specialist with assurance that the manipulation has not silently violated an upstream commitment. The bottleneck argument grounds the pipeline that Section 9.6 now specifies, and it anchors the pre-vetted-library concept of Section 9.12: the substrate over which the pipeline operates must itself be governed, or the pipeline’s traceability claim collapses into the unconstrained-generation failure mode.

9.6 The Procedural-Generation Pipeline: Four-Stage Architecture

Chapter 9: Generating Documented Dwelling Variants

9.1 Introduction
9.2 The Documentation Bottleneck
9.6 The Procedural-Generation Pipeline: Four-Stage Architecture (this section)
9.9 Pipeline Design Decisions and Independent Testability
9.12 The Pre-Vetted Library: Definition and Distinction
9.15 Theoretical Warrant: Three Converging Literatures
9.16 Population, Curation, and Future Work
9.19 Prototype: Runtime and the Documentation-Packet Contract
9.22 Predicate Parsing, the Unparsed-Line Discipline, and the End-to-End Run
9.25 Minimum DSR Depth, Reproducibility, and the Procedural Round-Trip
9.27 Conclusion and Handoff to Chapter 10

Under repeated SDA-aligned design instances, the cost of authoring per-dwelling compliance documentation scales linearly with instance count even though the underlying schema, the regulatory commitments, and the module specifications do not (Section 9.2). The procedural-generation pipeline operationalises a different cost curve. Authoring effort is invested once, in the substrate artefacts (the standardisation schema, the Governed Kernel Architecture, and the notation) and the curated baseline-library entries those artefacts produce; thereafter each design instance is generated by a deterministic transformation over substrate elements, and per-instance authoring effort approaches zero. The pipeline is the executable mechanism that converts that one-time substrate investment into a per-instance documentation packet. It is, by design, a thin executable layer over a thick specification stack: every domain commitment it relies on has already been declared, validated, and published in the chapters that precede it. The pipeline introduces no new domain knowledge; it operationalises knowledge formalised elsewhere.

9.7 Four-stage transformation: ingest, validate, transform, emit

The pipeline is specified as a four-stage transformation over a single baseline-library entry. Each stage declares an input schema, an output schema, and an invariant it must preserve, and the four are composable in the strict sense that the output schema of stage n is the input schema of stage n + 1.

The Ingest stage reads a baseline-library entry: one of the schema-compliant Markdown documents authored against the Chapter 6, Section 6.4 entry schema and bundled in the Chapter 6 baseline-library appendix.¹⁶ It opens the source file, parses the YAML frontmatter and Markdown body, and produces an internal representation comprising the eight mandatory frontmatter fields (entry_id, module_type, design_category, required_elements, optional_elements, quality_assertions, constraint_status, sda_provenance) and the three optional fields (configuration_notes, superseded_by, variant_of). Its invariant is round-trip fidelity at the lexical level: the parsed representation, re-serialised, is byte-identical to the source modulo declared canonicalisation of YAML key order. Any field the parser does not recognise is preserved verbatim rather than dropped: the pipeline’s first defence against silent data loss.

The Validate stage checks the internal representation against the PlaniSyn schema in two phases.¹⁷ The first phase verifies syntactic well-formedness against the eight-mandatory-plus-three-optional schema: every required field is present, every typed field carries a value of the declared type, and every cardinality bound is satisfied: for the ENT module type, for example, required_elements must carry the five-element profile space, opening, path, quality, context. The second phase verifies constraint compliance: each module type’s hard composition constraints from Chapter 6, Section 6.4 are evaluated against the entry’s declared element instances and quality assertions, and the boolean constraint_status field is recomputed from the entry’s content rather than read from the frontmatter. A discrepancy between the declared constraint_status and the recomputed value halts the pipeline and emits a diagnostic; a declared constraint_status: true is never silently overwritten with a recomputed false. The invariant is that no entry passes unless every applicable Section 6.4 hard constraint is satisfied, the Chapter 7 well-formedness conditions being part of the constraint set rather than a downstream check.

The Transform stage lifts the validated entry into a documentation-packet structure by applying the PlaniSyn transformation grammar over the entry’s semantic content, mirroring the two-layer architecture of the notation: the RecPol formal core supplies the abstract grammar over rectangular polyominoes and the deterministic parser the transformation respects, while PlaniSyn supplies the applied grammar that binds RecPol’s constructs to the nine module types, seven primitives and seven composites, and ten relation operators of the governed-kernel module architecture. The transformation lifts each has_quality triple into a Capital Parameter Tag binding, each shared-entity reference into a bilateral interaction declaration of the appropriate type (engaging, coupling, entangling, or encasing), and the entry’s context element into the Composition-Layer category-conditional binding that downstream rule selection consumes. Its invariant is vocabulary-closure preservation: every entity of the Chapter 6 entity vocabulary and every relation of the Chapter 6 relation vocabulary referenced in the source entry receives a syntactic realisation in the transformed packet, with nothing collapsed, approximated, or carried in an extra-vocabulary construct. This is the pipeline’s structural guarantee that transformation is lossless at the schema level.

The Emit stage serialises the transformed entry as a documentation packet: a self-contained Markdown document carrying the semantic content, the lifted PlaniSyn expressions, the cross-references back to the originating SDA schema clauses (or, where numbered clauses are not surfaced, to the FA typology graph edges that supply the corpus-derived equivalent), and a per-stage provenance ledger recording which substrate artefact contributed each lifted construct.¹⁸ The emitted packet is human-readable (a designer or examiner can read it without tooling), machine-queryable (its lifted expressions are parseable by the minimum-viable RecPol parser of Chapter 7, the full EBNF parser being deferred to future work as noted at Section 9.20), and traceable to the Standardisation Schema through the provenance ledger. Its invariant is that every claim in the packet is traceable to either the source entry, a Chapter 7 grammar production rule, or a standardisation-schema clause; no claim originates inside the pipeline itself. The four stages and their inter-stage contracts are summarised in the figure below.

Procedural Generation Pipeline Stages The pipeline as a four-stage transformation over a single baseline-library entry: Ingest (parse the Chapter-6-Section-6.4-conformant entry into an internal representation) → Validate (check against the PlaniSyn schema and the Chapter 6, Section 6.4 hard constraints) → Transform (lift the semantic content via the RecPol and PlaniSyn grammars) → Emit (serialise as a compliance-bearing documentation packet with a provenance ledger). Each arrow carries the stage’s declared output schema, which is the next stage’s declared input schema; the schemas are recorded jointly in Chapter 7, Section 7.3 and Chapter 7, Section 7.13.

artefact_e_pipeline — **Procedural Generation Pipeline Stages** The generator procedural-generation pipeline as a four-stage transformation over a single baseline-library entry. **Stage (a) Ingest** parses a Ch6-§6.6.3-conformant Markdown entry (eight mandatory + three optional frontmatter fields plus body) into an internal representation; invariant: round-trip lexical fidelity (unrecognised fields preserved verbatim). **Stage (b) Validate** checks the representation against the PlaniSyn schema in two phases: syntactic well-formedness against the eight-mandatory-plus-three-optional schema, then constraint compliance against Ch6 §6.4 hard composition constraints; invariant: no entry passes unless every applicable §6.4 hard constraint is satisfied, with `constraint_status` recomputed from content rather than read from frontmatter. **Stage (c) Transform** lifts the validated entry into a PlaniSyn-lifted packet by applying the RecPol formal core plus PlaniSyn applied grammar over the entry’s semantic content (Capital Parameter Tag bindings, bilateral interaction declarations, Composition-Layer category-conditional bindings); invariant: vocabulary-closure preservation (every Ch6 entity and relation receives a syntactic realisation). **Stage (d) Emit** serialises the lifted packet as a self-contained Markdown documentation packet at `output/<ENTRY_ID>.packet.md` with a per-stage provenance ledger; invariant: every claim in the packet is traceable to either the source entry, a Ch7 grammar production, or a standardisation schema clause. The four stages are composable in the strict sense that stage n’s declared output schema is stage n+1’s declared input schema, and each stage admits unit testing in isolation against its declared contract.

INFO

Render specification (Mermaid flowchart fallback)

The figure renders as a left-to-right Mermaid flowchart LR with four boxed stages connected by labelled arrows that carry the stage-output schema name. A pre-stage input ellipse (Baseline-library entry: Ch6-§6.6.3-conformant Markdown) feeds Ingest; a post-stage output ellipse (Documentation packet: output/<ENTRY_ID>.packet.md + run_logs/*.json) is fed by Emit. Each stage box carries a sub-label naming its preserved invariant. Reference Mermaid source for the renderer:

flowchart LR
    IN[("Baseline-library entry<br/>Ch6 §6.6.3 conformant<br/>Markdown + YAML frontmatter")]
    A["**Stage (a) Ingest**<br/>parse YAML + Markdown body<br/>build internal representation<br/><br/>_invariant_: round-trip<br/>lexical fidelity"]
    B["**Stage (b) Validate**<br/>phase 1: syntactic schema check<br/>phase 2: Ch6 §6.4 hard constraints<br/><br/>_invariant_: recomputed<br/>constraint_status from content"]
    C["**Stage (c) Transform**<br/>lift via RecPol + PlaniSyn<br/>CPT bindings, interaction decls<br/><br/>_invariant_: vocabulary-closure<br/>preservation"]
    D["**Stage (d) Emit**<br/>serialise packet + provenance<br/>write output/&lt;ENTRY_ID&gt;.packet.md<br/><br/>_invariant_: every claim<br/>traceable to source/grammar/schema"]
    OUT[("Documentation packet<br/>+ per-run JSON log")]
    IN --> A
    A -->|"internal repr.<br/>(8+3 schema)"| B
    B -->|"validated repr.<br/>+ recomputed status"| C
    C -->|"PlaniSyn-lifted packet<br/>+ provenance"| D
    D --> OUT
    classDef stage fill:#eef,stroke:#225,stroke-width:1.5px,rx:6,ry:6;
    classDef io fill:#fff5e6,stroke:#a60,stroke-width:1px;
    class A,B,C,D stage;
    class IN,OUT io;

Render pipeline target: publish-figures/artefact_e_pipeline.svg (rendered via the thesis Mermaid/SVG pipeline). The Mermaid source above is the authoritative render specification; the SVG embed line above auto-resolves once rendered.

9.8 Architectural position: a thin executable layer over a thick specification stack

The pipeline is not the locus of the contribution; the substrate artefacts are. The standardisation schema specifies the conformance target the emitted packets must satisfy; the Governed Kernel Architecture specifies the module taxonomy, primitive vocabulary, and hard-constraint set the validator consults; the notation specifies the formal grammar the transformer applies. The pipeline orchestrates these three substrates into a deterministic execution sequence and extends none of them. The position is deliberate and follows the platform-architecture commitment that governs the suite’s design decisions: the artefacts that change slowly (the schema, the vocabulary, the grammar) are the system’s governed kernel, and the executable layer that changes more rapidly is treated as governed instance library code that consumes the governed kernel without modifying it.¹⁹ The pipeline’s architectural commitment is therefore not innovation in code but compliance with the rule-set above it: any code that violates a Chapter 6 hard constraint or a Chapter 7 grammar production is, by definition, defective pipeline code.

A second consequence is that the pipeline is reproducible at the governed-kernel level even when the implementation is replaced. Re-implementing it in a different programming language, on a different operating system, or with a different parser library would yield byte-identical emitted packets for the same input entry, provided the four stage invariants are preserved. Reproducibility is a property of the substrate, not of the implementation; the implementation merely instantiates it. This is the operational meaning of the stable referential identity commitment carried forward from the theoretical framework: each baseline-library entry has a persistent identifier, each generated packet carries that identifier in its provenance ledger, and the identifier is invariant under implementation change. The pipeline is the substrate’s verb; the substrate is what the pipeline obeys.

9.9 Pipeline Design Decisions and Independent Testability

Chapter 9: Generating Documented Dwelling Variants

9.1 Introduction
9.2 The Documentation Bottleneck
9.6 The Procedural-Generation Pipeline: Four-Stage Architecture
9.9 Pipeline Design Decisions and Independent Testability (this section)
9.12 The Pre-Vetted Library: Definition and Distinction
9.15 Theoretical Warrant: Three Converging Literatures
9.16 Population, Curation, and Future Work
9.19 Prototype: Runtime and the Documentation-Packet Contract
9.22 Predicate Parsing, the Unparsed-Line Discipline, and the End-to-End Run
9.25 Minimum DSR Depth, Reproducibility, and the Procedural Round-Trip
9.27 Conclusion and Handoff to Chapter 10

The pipeline’s externally visible design decisions are recorded here so that examiners can interrogate them against the master-principles commitments the suite carries. Three decisions are load-bearing.

9.9 Three design decisions

The first decision exposes the pipeline as a deterministic command-line interface rather than a graphical user interface. The contribution under examination is procedural correctness, the claim that a deterministic transformation over a curated substrate produces a compliance-bearing documentation packet at low marginal cost, and that claim is testable at the command-line level: a single invocation takes a single entry as input, produces a single packet as output, and the same invocation on the same input must produce a byte-identical output across runs. A graphical interface would introduce user-experience design decisions (window layout, interaction affordances, error-display mode) whose evaluation belongs to a different research programme and whose presence would obscure rather than substantiate the procedural-correctness claim. The interaction surface is out of scope, and we record the decision as such.

The second decision fixes the runtime at a minimum-dependency posture. Reproducibility under a minimum-dependency assumption is the operative criterion: a future reader who attempts to re-execute the pipeline on a freshly installed Python 3.11 distribution should not have to resolve a tree of versioned third-party packages, each carrying its own supply-chain or deprecation risk. We implement the pipeline against the Python 3.11 standard library together with a single third-party dependency, the YAML parser pyyaml, which the prototype reported in Section 9.19 uses to lift entry frontmatter; the standard library supplies the Markdown body reading, the Unicode handling, the file input and output, and the test harness. The posture is minimum-dependency rather than zero-dependency, and we state the one dependency plainly: any further third-party dependency that a future revision acquires must be justified against the reproducibility cost it imposes on the reader. This is a structural commitment, not a stylistic preference; it follows from the position that the pipeline is a contribution at the level of design, not of engineering virtuosity.

The third decision fixes emission granularity at per-entry packet emission rather than corpus-wide bulk emission. Each entry’s provenance must be traceable, and a bulk export would either compromise that traceability, by aggregating multiple entries’ provenance into a shared ledger from which individual contributions are not reconstructible, or replicate it redundantly, by writing the same provenance into every record. Per-entry emission keeps the provenance ledger one-to-one with the emitted artefact, which is the form an examiner can audit without re-running the pipeline, and it supports incremental revision: when a single entry is revised, only that entry’s packet is regenerated, and the difference between the old and new packets is local to one file rather than dispersed across a bulk export.

9.10 Inter-stage contracts and independent testability

The four stages are composable because each declares an explicit contract over its input and output. Ingest takes a file path resolving to a Chapter-6-schema-conformant Markdown document and produces an internal representation conforming to the eight-mandatory-plus-three-optional schema. Validate takes that representation and returns it annotated with a recomputed constraint_status and a list of any violated hard constraints, empty when validation passes. Transform takes the validated representation and returns a PlaniSyn-lifted packet structure carrying the original semantic content plus the lifted Capital Parameter Tag bindings, interaction declarations, and Composition-Layer nesting. Emit takes the lifted structure and returns a Markdown serialisation augmented with a provenance ledger.

Because the contracts are explicit, each stage admits unit testing in isolation. The validator is tested by feeding it deliberately malformed entries and checking that it identifies the specific hard-constraint violations; the transformer by feeding it validated entries and checking that the lifted PlaniSyn expressions parse against the Chapter 7 grammar through the minimum-viable RecPol parser; the emitter by feeding it lifted packets and checking that the round-trip serialise-then-parse cycle preserves all packet content. Composability follows from the contract discipline: a developer modifying any one stage knows exactly which schema must continue to be honoured at the boundary, and any modification that violates the schema is detectable at the boundary rather than only at the end of the pipeline. This is the operational consequence of the interface-governed coupling commitment: the stages are bounded by finite, expressed, declared interfaces, so post-modification verification is scoped to the modified stage and whole-pipeline regression is unnecessary.

9.11 The specification and its existence proof

What Section 9.6 specifies is independent of any particular implementation. The four-stage architecture, the inter-stage contracts, the design decisions, and the pipeline’s position over the substrate stack are propositions about the documentation-generation problem and its disciplined solution; the prototype reported in Section 9.19 is the existence proof that the propositions are realisable under the stated constraints. The specification stands or falls on its substrate dependence: if the standardisation schema, the Governed Kernel Architecture, and the notation are sound, the pipeline is sound; if any substrate artefact admits a defect, the pipeline inherits it at the stage that consumes that substrate. The prototype’s first integration test follows directly: to run each of the nine FA-category entries through the full Ingest → Validate → Transform → Emit cycle and verify that the emitted packet is parseable, schema-conformant, and traceable. Documentation generation, on this account, is properly understood as the executable consequence of compliance-aware design specification, not as an autonomous engineering activity operating over loose semantic content.

9.12 The Pre-Vetted Library: Definition and Distinction

Chapter 9: Generating Documented Dwelling Variants

9.1 Introduction
9.2 The Documentation Bottleneck
9.6 The Procedural-Generation Pipeline: Four-Stage Architecture
9.9 Pipeline Design Decisions and Independent Testability
9.12 The Pre-Vetted Library: Definition and Distinction (this section)
9.15 Theoretical Warrant: Three Converging Literatures
9.16 Population, Curation, and Future Work
9.19 Prototype: Runtime and the Documentation-Packet Contract
9.22 Predicate Parsing, the Unparsed-Line Discipline, and the End-to-End Run
9.25 Minimum DSR Depth, Reproducibility, and the Procedural Round-Trip
9.27 Conclusion and Handoff to Chapter 10

The pipeline of Section 9.6 transforms baseline-library entries into compliance-bearing documentation packets, but it presupposes a baseline-library fit to be transformed. The pipeline cannot manufacture compliance content; it can only operationalise content already authored, validated, and sealed against the schema. That precondition is the substantive concept we name and defend: the pre-vetted library. A pre-vetted library is a curated set of documented, schema-compliant, regulator-aligned spatial modules whose authoring cost is incurred once and whose generative value is capitalised across an indefinite population of downstream design instances.

9.13 What the pre-vetted library is

A pre-vetted library, in the sense we give the term, is a closed set of entries indexed by module type and design category, in which each entry instantiates one of the nine Governed Kernel Architecture module types (ENT, CIR, SAN, BED, LIV, KIT, SVC, EXT, DWL) at a declared design category from the SDA-aligned set FA, RB, LV, UN. Each entry is a self-contained Markdown document carrying eight mandatory and three optional frontmatter fields per the Chapter 6, Section 6.4 schema, plus a structured body that itemises required and optional element instances, registers consolidated quality assertions, verifies hard composition constraints, and records SDA provenance against either numbered Design Standard clause identifiers or, where the corpus extraction has not yet surfaced numbered identifiers, equivalent FA typology graph edges. The library’s content is therefore not a catalogue of geometry; it is a catalogue of constraint-bearing module specifications whose generative role is to supply the substrate that the procedural-generation pipeline of Section 9.6 consumes.

The pre-vetting commitment carries four operational implications. First, an entry is admitted to the library only when its declared constraint_status evaluates to true against the Chapter 6, Section 6.4 hard composition constraints applicable to its module type; a defective entry is not silently downgraded to a warning but excluded. Second, an entry’s SDA provenance must be explicit at the granularity the corpus permits: where the corpus surfaces numbered clause identifiers the entry cites them, and where it surfaces only FA typology edges the entry cites edges and surfaces the asymmetry transparently rather than fabricating clause numbering.²⁰ Third, an entry’s interface obligations, how it composes with neighbouring modules at shared boundaries, are declared explicitly through the bilateral interaction declarations specified by the notation’s four interaction types: engaging, coupling, entangling, encasing. Fourth, an entry carries provenance metadata recording its authoring date, the agent that authored it, the substrate documents the agent consulted, and any honest provenance gaps the authoring surfaced; the entry is not anonymous, and its production trail is auditable.

Pre-vetting is, accordingly, a one-time investment that pays back across N design instances. The cost of authoring an entry is bounded by the size of the schema and the depth of the Chapter 6 hard constraints, both fixed at substrate-design time; the cost of consuming an entry downstream is the negligible cost of a deterministic transformation through the Section 9.6 pipeline. The economic structure is that of an amortised fixed cost: the per-entry authoring cost divided across the design instances the pipeline subsequently generates from it. As the design-instance count grows, the per-instance cost of compliance documentation approaches the marginal cost of the pipeline’s transformation, which is bounded by parser and emitter complexity rather than by domain-knowledge re-derivation.

The library’s complement on the geometric side is supplied by the Chapter 8 census rather than by the pipeline. Where the pre-vetted library carries the constraint-bearing module specifications, the geometry read of the census carries the dimensional ones: the data-derived primitive kit and the context-conditioned size bands collected in Appendix H, Section H.8 give the generator a measured sizing vocabulary: observed median-and-interquartile bands for a given room context that the generator reads and redraws from, never a fitted model, and with a hard out-of-corpus guard that flags rather than synthesises any context the census did not observe. Both substrates are descriptive over the complete census and are consumed together; the library says what a module must satisfy, the geometry read says what size it is observed to take.

The geometry read also fixes the grain at which the generator is committed to dimensioning those modules. Following the three-tier grid established at the integration of Chapter 8, Section 8.50, the generator quantises its room-module dimensions to the 150 mm meso grid, the grain calibrated so that the modules it emits compose with least combinatorial misfit while each room stays within tolerance of its observed size, and reserves the finer 50 mm micro grain for the component dimensions within a module. Quantising the sampled size bands to the meso grid is what keeps the generator’s output within a small, shared module vocabulary rather than an unbounded set of one-off dimensions, so that two independently generated dwellings draw on interchangeable modules; it is the operational point at which the bounded-optionality argument of Chapter 2, that a coarser grid trades nominal degrees of freedom for real composability, becomes a property of the generated artefact rather than a claim about it.

9.14 Distinction from naive component libraries

The phrase “component library” is occupied in architectural practice by two well-known artefact classes from which the pre-vetted library must be sharply distinguished. The first is the CAD-block library: a collection of two- or three-dimensional geometric primitives (door symbols, fixture plans, parametric blocks) whose role is to accelerate drawing production. A CAD block carries geometry; it does not carry interface obligations, regulatory commitments, or formal-grammar specifications, and a designer may insert a CAD block into a drawing without thereby committing to any compliance claim about the dwelling the drawing represents. The second is the parametric-component library familiar from BIM authoring environments, in which a component declares parameters (width, height, material, placement constraints) whose values can be bound at instance time.²¹ A parametric component carries parameter ranges; it does not necessarily carry the regulatory commitments those parameters must satisfy nor the inter-component composition contracts its interfaces must honour.

The pre-vetted library is neither. Each entry carries four properties the naive libraries do not: interface obligations declared bilaterally for every shared-entity boundary the module participates in, including cardinality bounds, primitive-consistency requirements, and quality-assertion compatibility checks at the boundary; regulatory commitments in the form of explicit references to the Standardisation Schema clauses (or FA typology edges) that the entry’s quality assertions satisfy, traceable from the entry back to the SDA Design Standard’s provisions through the corpus extraction recorded in the standardisation schema; provenance recording how the entry was authored, against which substrate documents, and with what honest gaps; and a PlaniSyn-encoded specification that the procedural pipeline can transform without re-deriving the entry’s semantic content. The CAD block carries the first property only by accident if at all, and none of the others; the parametric component carries fragmentary versions of the first and possibly the second, but typically not the third or fourth at the granularity required here. The pre-vetted-library entry is, in short, a governed component specification where the naive libraries supply only geometric or parametric ones.

The distinction matters because the pipeline’s correctness rests on the fourth property. If the entries the pipeline ingests are not formally encoded in a grammar the pipeline can transform, the pipeline cannot operate; it would have to recover the formal encoding at run time, which would re-introduce the per-instance authoring cost the concept is designed to amortise. The pre-vetted library is therefore not an optional convenience the pipeline could do without; it is the substrate condition that makes the pipeline executable.

9.15 Theoretical Warrant: Three Converging Literatures

Chapter 9: Generating Documented Dwelling Variants

9.1 Introduction
9.2 The Documentation Bottleneck
9.6 The Procedural-Generation Pipeline: Four-Stage Architecture
9.9 Pipeline Design Decisions and Independent Testability
9.12 The Pre-Vetted Library: Definition and Distinction
9.15 Theoretical Warrant: Three Converging Literatures (this section)
9.16 Population, Curation, and Future Work
9.19 Prototype: Runtime and the Documentation-Packet Contract
9.22 Predicate Parsing, the Unparsed-Line Discipline, and the End-to-End Run
9.25 Minimum DSR Depth, Reproducibility, and the Procedural Round-Trip
9.27 Conclusion and Handoff to Chapter 10

The pre-vetted-library concept is justified by a convergence of three literatures, each of which supplies a property that our instantiation of the concept retains.

The first is the Open Building tradition’s notion of capacity-providing infrastructure.²²²³ In the Habrakenian framing, a Support is a long-lived shared capacity whose stability supports a population of short-lived user-specific Infill configurations, and the boundary between Support and Infill is governed by an explicit interface that prevents cross-layer cascade. The pre-vetted library plays the analogous role at the documentation layer: each entry is a specification-level capacity element whose stability supports a population of design-instance documentation packets, and the boundary between the library and the packets is governed by the formal-grammar interface specified by the notation. We invoke Open Building not for material-substrate analogy alone but for its structural commitment that capacity layers earn their value by disciplining the interface downstream consumers must respect: a commitment we instantiate representationally rather than physically.

The second is the platform-architecture literature.²⁴ In Baldwin and Clark’s framing, a platform is a visible-design-rule layer that coordinates the work of independent complement producers, and complements are the configurations the platform admits provided they conform to the visible rules. On our reading, the pre-vetted library is the platform’s parts inventory: the entries are the platform-conformant complements, the Chapter 6, Section 6.4 hard constraints and the Chapter 7 grammar productions are the visible design rules, and the procedural pipeline is the operator that composes complements into design-instance configurations. The reading licenses two predictions we accept as evaluation criteria: that the visible rules remain stable across complement revisions, a stability the Chapter 6 governed kernel supplies, and that complement production scales through replication of the visible-rule conformance procedure, a scaling the pipeline’s deterministic transformation supplies. The engine that those visible rules govern is developed in Chapter 6; here the library is positioned as its parts inventory, not re-derived.

The third is the standards-formalisation literature.²⁵²⁶ The standards-formalisation programmes that have populated the BIM ecosystem since the early 2000s consistently report that automated compliance checking presupposes a curated input substrate: the rules can be formalised, the parser built, and the transformation executed only when the inputs are pre-vetted entries conforming to the schema the rules expect. The pre-vetted library is our instantiation of that precondition for the SDA-aligned dwelling-design domain, and the per-entry authoring discipline that produces the library is the operational form the precondition takes.

The three literatures converge on a single proposition: schema-driven generation (whether for material configuration in Open Building, product-architecture composition in platform architecture, or compliance-aware documentation in standards-formalisation) requires a pre-vetted entry pool to operate. The contribution is not the proposition itself but the disciplined instantiation of that pool for the SDA dwelling-design domain, at a granularity sufficient to support the Section 9.6 pipeline and the Chapter 10 demonstration cases.

9.16 Population, Curation, and Future Work

Chapter 9: Generating Documented Dwelling Variants

9.1 Introduction
9.2 The Documentation Bottleneck
9.6 The Procedural-Generation Pipeline: Four-Stage Architecture
9.9 Pipeline Design Decisions and Independent Testability
9.12 The Pre-Vetted Library: Definition and Distinction
9.15 Theoretical Warrant: Three Converging Literatures
9.16 Population, Curation, and Future Work (this section)
9.19 Prototype: Runtime and the Documentation-Packet Contract
9.22 Predicate Parsing, the Unparsed-Line Discipline, and the End-to-End Run
9.25 Minimum DSR Depth, Reproducibility, and the Procedural Round-Trip
9.27 Conclusion and Handoff to Chapter 10

9.16 Current population state

Our instantiation of the library is described in Chapter 6, Section 6.4 and bundled in publish-thesis/publish-data/appendix-data-ch6-baseline-library/. The current population reflects a tiered-population, FA-first scope decision: authoring effort is focused on the SDA-aligned Fully Accessible category at full schema depth, with the remaining design categories (RB, LV, and UN) deferred to on-demand authoring as downstream demand requires.

The FA category is now populated at full schema depth across all nine module types: eight standalone entries (ENT-FA-01, CIR-FA-01, BED-FA-01, LIV-FA-01, KIT-FA-01, SVC-FA-01, EXT-FA-01, DWL-FA-01) bundled in the appendix-data directory, plus SAN-FA-01 retained inline at Chapter 6, Section 6.4 as the schema demonstrator. Each entry instantiates the eight mandatory and three optional fields of the schema, satisfies its module type’s Chapter 6, Section 6.4 hard constraints, and records SDA provenance against the FA typology graph or, for SAN-FA-01, the numbered SDA Design Standard clauses. The eight bundled entries total 1,890 lines of authored specification content and roughly nine FA typology graph triples per entry, with the largest entry, DWL-FA-01, the dwelling-aggregate module, carrying eighteen edges to anchor its enumeration of the eight sibling FA entries: the eight non-dwelling module types, SAN among them.²⁷ Nine of the twenty-five minimum-library entries are thus populated.

The remaining sixteen entries, the RB and LV/UN categories across the nine module types, are a declared scope decision rather than an incompleteness. Under the master-principles ground that work whose downstream consumption is unclear is not justified by its cost, those entries will be authored on demand if and when the prototype demonstration or the downstream demonstration cases require them. The SDA-Compliant Dwelling case (Demonstration Case 2 in Chapter 10) directly exercises the FA category; the Typical Detached Dwelling case (Demonstration Case 1) may require entries not yet populated, and if so the demonstration work will trigger on-demand population. The current state is therefore scope-limited relative to a broader population that downstream demand has not yet justified, not incomplete relative to the substantiation requirement.

9.17 Curation discipline

Each FA-category entry was authored against the SDA schema, the Chapter 6, Section 6.2 constituent specifications, and the Chapter 6, Section 6.3 interaction rules.²⁸ The curation discipline has four components. The first is schema conformance: every entry’s frontmatter populates the eight mandatory fields with declared-type values and the three optional fields either with values or with explicit absence-noted declarations, the schema permitting level: absent as a substantive declaration distinct from level: null. The second is constraint compliance: every entry’s constraint_status is independently recomputable from its content against the Chapter 6, Section 6.4 hard constraints, and a discrepancy between the declared and the recomputed status is treated as a defect that excludes the entry. The third is cross-reference to Chapter 5 schema clauses: every quality assertion cites either a numbered SDA Design Standard clause, where the corpus extraction has surfaced one, or an FA typology graph edge, where it has not, with the asymmetry surfaced transparently. The fourth is bundle sealing: the eight entries are sealed in an appendix-data bundle indexed by idx_appendix.md, entries are not modified silently after sealing, and any post-sealing revision (for example, the back-propagation of numbered SDA clause identifiers into the entries’ sda_provenance fields) is recorded as a tracked debt against the bundle.

The curation cost is non-trivial but front-loaded. Each entry took approximately one working day to author at full schema depth, including the corpus consultation, the constraint-compliance verification, and the surfaced-gap honesty note; the eight bundled entries therefore represent approximately eight working days of authoring, supplemented by the substrate documents whose authoring precedes the library and is not amortised against it directly. Thereafter the pipeline of Section 9.6 consumes the curated entries at no marginal authoring cost per design instance, so the total cost of compliance documentation across N instances is bounded by the fixed library-authoring cost plus N times the negligible pipeline transformation cost. As N grows, the per-instance cost approaches the pipeline cost: the operational meaning of the amortised-fixed-cost structure that Section 9.13 declared.

9.18 Position against future work

The pre-vetted-library concept is not closed at the present scope; it admits three lines of future work that we declare explicitly so that an examiner can scrutinise them rather than discover them by inference. The first is cross-corpus generalisation to the remaining design categories and to the wider Australian housing context. The current FA-category population covers the SDA-aligned category that supplies the strongest single-claim share of the Chapter 5 and Chapter 6 evidence; the wider Australian housing context is out of present scope. Chapter 6 records the cross-corpus generalisation as a registered debt (Ch6-AW-04) and specifies the corpora a future agenda would consult: the Liveable Housing Australia design guidelines, the broader National Construction Code residential provisions, and state-level adaptable-housing standards.

The second is independent third-party validation of the curation. The current curation is candidate-authored, and an examiner may legitimately ask whether an independent third party would converge on the same entry content. Chapter 5 records this as a registered debt (Ch5-CW-04) and specifies the validation protocol: a second author would re-author a sample of the FA entries against the same substrate documents, and the agreement between the candidate’s entries and the second author’s entries would be reported. The protocol is specified but not executed within the present scope, and its execution depends on the availability of qualified second authors; it is therefore deferred to a post-submission revision cycle.

The third is external inter-rater reliability for the Standardisation Schema’s primitives. The pre-vetted library inherits the standardisation schema, and if the schema’s primitives (the entity vocabulary of Chapter 6, Section 6.1 and the relation vocabulary) admit ambiguity that a third-party rater would resolve differently from the candidate, that ambiguity propagates into the library through the entries’ quality-assertion triples. Ch5-CW-04 covers the schema-primitive reliability alongside the entry-curation reliability; the protocol calls for a structured coding exercise on a sample of SDA Design Standard clauses, with raters blind to the candidate’s extraction, and the per-primitive Cohen’s κ reported as the reliability indicator. The exercise is specified but not executed within the present scope and is declared as future work; we report no computed reliability statistic here.

These three lines are scope declarations, not threats to the substantiation claim. What we substantiate here is that the pre-vetted-library concept can be instantiated at the FA category, at full schema depth, against the SDA-aligned domain, with declared honesty about provenance asymmetries: an instantiation sufficient to feed the Section 9.6 pipeline and the SDA-Compliant Dwelling demonstration in Chapter 10. The wider claim that a pre-vetted library can be generalised across the full Australian housing context, validated independently, and grounded in reliability-tested schema primitives is a claim the future-work agenda will substantiate. The contribution at submission is the disciplined instantiation of the pre-vetted-library concept for the SDA dwelling-design domain at a granularity sufficient for the procedural-generation pipeline: a familiar move across the three literatures Section 9.15 surveyed, made disciplined and domain-specific here.

9.19 Prototype: Runtime and the Documentation-Packet Contract

Chapter 9: Generating Documented Dwelling Variants

9.1 Introduction
9.2 The Documentation Bottleneck
9.6 The Procedural-Generation Pipeline: Four-Stage Architecture
9.9 Pipeline Design Decisions and Independent Testability
9.12 The Pre-Vetted Library: Definition and Distinction
9.15 Theoretical Warrant: Three Converging Literatures
9.16 Population, Curation, and Future Work
9.19 Prototype: Runtime and the Documentation-Packet Contract (this section)
9.22 Predicate Parsing, the Unparsed-Line Discipline, and the End-to-End Run
9.25 Minimum DSR Depth, Reproducibility, and the Procedural Round-Trip
9.27 Conclusion and Handoff to Chapter 10

The four-stage pipeline of Section 9.6 and the substrate library of Section 9.12 are architectural propositions that admit a single direct test: an executable instantiation that consumes the curated nine-entry FA-category corpus and emits a documentation packet for each entry without recourse to non-deterministic machinery. The prototype is that instantiation: a deterministic Python command-line tool that operates the four stages over the eight bundled FA entries plus the inline SAN-FA-01 exemplar carried at Chapter 6, Section 6.4, validates every entry against the PlaniSyn schema, and writes a self-contained Markdown packet per entry conforming to the generator’s documentation-packet contract. It is the concrete evidence that the bottleneck diagnosed in Section 9.2 is removable at the present instantiation depth: a thin executable layer over the suite’s specification stack suffices to convert the human-authored library into compliance-bearing artefacts.

9.20 Architecture and runtime stack

The prototype is a single-process Python application written against the standard library together with one third-party dependency, the YAML parser pyyaml, and targeting Python 3.11 or later. The dependency choice is structural rather than expedient: the prototype invokes no runtime large-language model, presents no graphical user interface, and requires no network access during a run. A reader with a freshly installed Python 3.11 distribution and a copy of the prototype directory can re-execute the pipeline against the bundled baseline-library and obtain the nine packets without resolving a tree of versioned dependencies. This minimum-dependency posture is the operational form of the reproducibility commitment specified in Section 9.6, the contribution under examination is procedural correctness rather than engineering virtuosity, and the single declared dependency is stated plainly so the reproducibility cost is legible. It also follows from the platform-architecture commitment that governs the suite’s design decisions: the pipeline is positioned as governed instance library code that consumes the frozen substrate stack, and the absence of a heavy runtime is the visible signal that the substrate, not the implementation, carries the intellectual weight.²⁹

The source tree decomposes into five Python modules and a four-module test suite, with module boundaries mapping one-to-one onto the four pipeline stages plus the orchestration entry point. The Ingest stage is realised by parser.py, which reads a baseline-library Markdown file, splits the YAML frontmatter from the body, lifts the frontmatter to a Python dictionary via pyyaml, extracts numbered sections via a regular-expression scanner, and captures the entry’s summary callout from the first > [!info] block. The Validate and Transform stages are realised jointly by grammar.py, which enforces the eight-mandatory-plus-three-optional schema commitments of Chapter 6, Section 6.4, recognises three distinct provenance shapes corresponding to the three authoring conventions across the FA corpus, and lifts the schema-conformant content into a normalised module-specification dictionary in the spirit of the PlaniSyn predicate_list production.³⁰ The Emit stage is realised by packet.py, which renders a module-specification dictionary plus a validation report into a self-contained Markdown packet at output/<ENTRY_ID>.packet.md. The orchestration entry point is cli.py, which exposes five subcommands (ingest, validate, transform, emit, run-all) via argparse and writes a per-run JSON log to run_logs/. The four-module test suite exercises the parser, the validator, the transformer, and the emitter independently, with a final end-to-end pass that re-runs the full pipeline over the nine-entry corpus and asserts on the run-log totals. Twenty unit tests pass deterministically, so the pipeline’s external behaviour is pinned at the contract level rather than left to drift between revisions.

The module decomposition is significant. Each Python module corresponds to exactly one of the four stage contracts declared in Section 9.6, and each module’s public surface is a small set of pure functions that consume the previous stage’s output schema and produce the next stage’s input schema. A re-implementation in a different programming language would preserve the module-to-stage correspondence even where the syntactic surface diverged, because the correspondence is a property of the substrate specification rather than of the chosen language: the operational meaning of the Section 9.6 claim that the pipeline is reproducible at the governed-kernel level even when the implementation is replaced.

9.21 The documentation-packet contract

The packet contract is versioned in the prototype’s frontmatter packet_schema field, the version string is carried in every emitted packet, and the field is the audit hook by which downstream consumers can detect contract evolution. A packet comprises eleven YAML frontmatter fields and seven body sections, summarised below for examiner reference.

Slot	Field or section	Source
Frontmatter	`packet_schema`	Constant identifier (`artefact-d-packet`) qualified by the prototype’s declared schema version.
Frontmatter	`packet_kind`	Constant `artefact-d-documentation-packet`.
Frontmatter	`entry_id`	Source frontmatter `id` (e.g., `KIT-FA-01`).
Frontmatter	`module_class`	Source `module_type` (one of nine module codes).
Frontmatter	`design_category`	Source `design_category` (one of `FA`, `RB`, `LV`, `UN`).
Frontmatter	`constraint_status`	Source `constraint_status` boolean.
Frontmatter	`binding_for`	Source `binding_for` (e.g., `P4`).
Frontmatter	`schema_version`	Source `schema_version`, typically Chapter 6, Section 6.4.
Frontmatter	`source_path`	Absolute path of the input Markdown file.
Frontmatter	`emitted_at`	Run timestamp in ISO 8601 UTC form.
Frontmatter	`validation_ok`	Validator pass or fail (mirrored from the Validation Report section).
Body Section 1	Module Specification	Tabular re-statement of the seven primary frontmatter fields.
Body Section 2	PlaniSyn Predicate List	Structured `(verb, subject, object, domain)` rows; a Section 2.1 subsection lists Unparsed lines.
Body Section 3	Interface Obligations	Cross-module adjacency hooks detected in the entry body.
Body Section 4	SDA Constraints	Verbatim provenance triples preserved as the auditable link to the SDA Design Standard.
Body Section 5	Configuration Notes	Verbatim from source frontmatter.
Body Section 6	Provenance Trace	Source path, section inventory, and summary callout.
Body Section 7	Validation Report	Pass or fail status, plus enumerated errors and warnings.

The contract is deliberately narrower than the full Chapter 6, Section 6.4 schema. The packet does not republish the entry’s full body content; it republishes the schema-significant slots an examiner needs to verify compliance, and it leaves the source entry as the authoritative full record by carrying the absolute source_path in its provenance trace. This back-pointer discipline preserves single-source-of-truth integrity at the substrate level: an examiner who needs the full entry follows the path, and one who needs the lifted, schema-checked, machine-queryable view reads the packet. The two artefacts compose without redundancy.

9.22 Predicate Parsing, the Unparsed-Line Discipline, and the End-to-End Run

Chapter 9: Generating Documented Dwelling Variants

9.1 Introduction
9.2 The Documentation Bottleneck
9.6 The Procedural-Generation Pipeline: Four-Stage Architecture
9.9 Pipeline Design Decisions and Independent Testability
9.12 The Pre-Vetted Library: Definition and Distinction
9.15 Theoretical Warrant: Three Converging Literatures
9.16 Population, Curation, and Future Work
9.19 Prototype: Runtime and the Documentation-Packet Contract
9.22 Predicate Parsing, the Unparsed-Line Discipline, and the End-to-End Run (this section)
9.25 Minimum DSR Depth, Reproducibility, and the Procedural Round-Trip
9.27 Conclusion and Handoff to Chapter 10

9.22 PlaniSyn predicate parsing and the Unparsed-line discipline

The transformer recognises three provenance shapes corresponding to the three authoring conventions in the FA corpus: the em-dash form FA-TYP:Subject—PREDICATE—Object, used by ENT-FA-01, CIR-FA-01, and EXT-FA-01; the arrow form FA-typology node: Subject → PREDICATE → Object, used by DWL-FA-01; and the prose form whose triple is carried in a trailing parenthetical, used by KIT-FA-01, LIV-FA-01, SVC-FA-01, and BED-FA-01. A bare-triple residual fallback handles the inline SAN-FA-01 exemplar’s blockquote content. Each recognised line is lifted to a (verb, subject, object, domain) tuple and emitted in the packet’s predicate-list section; the verb-position regular expressions enforce upper-case predicate identifiers and reject lower-case prose, which ensures that ambiguous lines are surfaced rather than silently coerced.

The interface-obligations detector surfaces three families of cross-module hook from the entry body.³¹ The first family is pairwise adjacency declarations of the form XXX–YYY where both module codes are members of the closed nine-type taxonomy; non-DWL entries retain only adjacencies that include their own module code, while DWL entries retain every observed adjacency because the dwelling integration point owns the complete boundary network. The second family is the DWL child-module aggregation enumeration, in which lines naming a sibling FA-category baseline entry are lifted to obligations identifying DWL as the aggregator and the referenced child as the aggregated instance, surfacing the seven aggregation hooks that DWL-FA-01 declares. The third family is shared-entity bilateral declarations of the form shared entity declared in both XXX-FA-01 and YYY-FA-01, which surface the cross-module-constraint prose: for example, the shared path_ext_outdoor_link between EXT-FA-01 and LIV-FA-01, and the shared opening_ent_entry_doorway between ENT-FA-01 and CIR-FA-01. The detector additionally surfaces the DWL–All governance contract as a distinct obligation for DWL entries, capturing the dwelling-level commitment that category-conditional changes cascade to all sub-modules carrying context. After the extension, the DWL-FA-01 packet surfaces twelve interface obligations (seven child aggregations, the DWL–All governance contract, two pairwise adjacencies, and two shared-entity bilateral declarations) where the prior parser emitted zero for DWL-FA-01, because its dwelling-aggregate body did not match the original narrow filter that required the entry’s own module code to appear in the pair.

Three classes of provenance line are deliberately retained as Unparsed material in the packet’s predicate-list subsection rather than coerced into the predicate table. The first is multi-object inventory predicates such as Kitchen SHALL_HAVE Cooktop, Rangehood, Oven, Sink, Dishwasher: the prose form encodes a list rather than a single triple, and the PlaniSyn predicate_list production expects single-object triples. The second is composite assertions such as Light_Switch HAS_HEIGHT_RANGE 900–1100 mm; HAS_TYPE Rocker_Style, which encode two predicates in one line. The third is free-text honesty notes such as the inferred-tap commentary carried in KIT-FA-01, which is by-design audit material rather than a schema-compliant triple. None of the three is a PlaniSyn grammar gap: the multi-object form is an entry-author convention an upstream split would resolve, the composite form is a downstream-prose concern, and the honesty notes are explicit audit artefacts. The prototype therefore retains all three verbatim in the packet’s SDA Constraints section, the auditable link to the SDA Design Standard, and lists them as flagged audit follow-up. This discipline is the operational form of the Section 9.12 commitment that the pre-vetted library carries provenance: the library loses no information when the transformer cannot lift it, and the packet does not silently misrepresent what the library declares.

9.23 The end-to-end run

A single command runs the full pipeline over the nine-entry corpus.

python -m artefact_d.cli run-all `
    --input appendix-data-ch6-baseline-library/ `
    --output output/ `
    --log-dir run_logs/

The command iterates over the eight standalone FA entries (ENT-FA-01, CIR-FA-01, BED-FA-01, DWL-FA-01, EXT-FA-01, KIT-FA-01, LIV-FA-01, SVC-FA-01) and then synthesises the ninth entry by reading the inline SAN-FA-01 blockquote from the Chapter 6, Section 6.4 governed instance library specification and reconstructing a frontmatter dictionary from its bold-tagged fields. The synthesised entry is processed on the same code path as the eight standalone entries, which keeps SAN-FA-01 on equal evidential footing with the bundled corpus despite its inline-source character. The pipeline emits nine packets at output/<ENTRY_ID>.packet.md, writes a JSON run-log, and exits with status zero. Validation passes on all nine entries: the run-log records expected_entries: 9, processed_entries: 9, validation_passed: 9, validation_failed: 0, and an empty pipeline_errors list. The test suite reports twenty passes and zero failures across the parser, validator, emitter, and end-to-end suites in under a second of wall-clock time. The run is summarised in the figure below.

The Generator Prototype: End-to-End Run Flow The prototype’s run flow over the nine FA-category baseline-library entries: the eight standalone Markdown files in appendix-data-ch6-baseline-library/ (ENT-FA-01, CIR-FA-01, BED-FA-01, DWL-FA-01, EXT-FA-01, KIT-FA-01, LIV-FA-01, SVC-FA-01) plus the inline SAN-FA-01 blockquote at Chapter 6, Section 6.4 enter the four-stage pipeline (Ingest → Validate → Transform → Emit) and exit as nine documentation packets at output/<ENTRY_ID>.packet.md, with a per-run JSON log written to run_logs/ recording expected_entries: 9, processed_entries: 9, validation_passed: 9, and validation_failed: 0.

artefact_e_pipeline_run — **Generator Prototype: End-to-End Run Flow** The prototype’s run-flow over the nine FA-category baseline-library entries. **Inputs (left)**: eight standalone Markdown files in `appendix-data-ch6-baseline-library/` (`ENT-FA-01`, `CIR-FA-01`, `BED-FA-01`, `DWL-FA-01`, `EXT-FA-01`, `KIT-FA-01`, `LIV-FA-01`, `SVC-FA-01`) plus the inline `SAN-FA-01` blockquote at Ch6 §6.6.3 (synthesised by the parser into a frontmatter dictionary on the same code path). **Pipeline (centre)**: a single `python -m artefact_d.cli run-all --input <appendix-dir>/ --output output/ --log-dir run_logs/` invocation iterates the four stages (Ingest → Validate → Transform → Emit) over each entry. **Outputs (right)**: nine documentation packets at `output/<ENTRY_ID>.packet.md` (one per entry, on equal evidential footing including the synthesised `SAN-FA-01`), plus one per-run JSON log at `run_logs/` recording `expected_entries: 9`, `processed_entries: 9`, `validation_passed: 9`, `validation_failed: 0`, an empty `pipeline_errors` list, and a per-entry result block carrying `predicate_count`, `unparsed_provenance_count`, `interface_obligations_count`, `bytes_written`, and `packet_path`. The diagram should render the nine inputs as a vertical stack on the left, the four pipeline stages as a horizontal centre band, and the nine packets plus the JSON log as a vertical stack on the right, with a labelled fan-in arrow set on the left and a labelled fan-out arrow set on the right. Validation passes on all nine entries; the test suite reports twenty passes and zero failures across the parser, validator, emitter, and end-to-end suites in under one second of wall-clock time.

INFO

Render specification (Mermaid flowchart fallback)

The figure renders as a Mermaid flowchart LR with a left-side input subgraph (the nine entries), a centre pipeline subgraph (the four stages), and a right-side output subgraph (the nine packets plus the JSON log). Reference Mermaid source for the renderer:

flowchart LR
    subgraph IN ["Inputs (n=9 baseline-library entries)"]
        direction TB
        I1["ENT-FA-01.md"]
        I2["CIR-FA-01.md"]
        I3["BED-FA-01.md"]
        I4["DWL-FA-01.md"]
        I5["EXT-FA-01.md"]
        I6["KIT-FA-01.md"]
        I7["LIV-FA-01.md"]
        I8["SVC-FA-01.md"]
        I9["SAN-FA-01<br/>(inline blockquote<br/>at Ch6 §6.6.3)"]
    end
    subgraph PIPE ["Pipeline (cli.py run-all)"]
        direction LR
        P1["Ingest<br/>parser.py"]
        P2["Validate<br/>grammar.py"]
        P3["Transform<br/>grammar.py"]
        P4["Emit<br/>packet.py"]
        P1 --> P2 --> P3 --> P4
    end
    subgraph OUT ["Outputs"]
        direction TB
        O1["DWL-FA-01.packet.md"]
        O2["ENT-FA-01.packet.md"]
        O3["CIR-FA-01.packet.md"]
        O4["BED-FA-01.packet.md"]
        O5["EXT-FA-01.packet.md"]
        O6["KIT-FA-01.packet.md"]
        O7["LIV-FA-01.packet.md"]
        O8["SVC-FA-01.packet.md"]
        O9["SAN-FA-01.packet.md"]
        LOG[/"run_logs/&lt;timestamp&gt;.json<br/>9 entries · 9 passed · 0 failed"/]
    end
    I1 --> P1
    I2 --> P1
    I3 --> P1
    I4 --> P1
    I5 --> P1
    I6 --> P1
    I7 --> P1
    I8 --> P1
    I9 -.->|synthesised<br/>frontmatter| P1
    P4 --> O1
    P4 --> O2
    P4 --> O3
    P4 --> O4
    P4 --> O5
    P4 --> O6
    P4 --> O7
    P4 --> O8
    P4 --> O9
    P4 --> LOG
    classDef inp fill:#fff5e6,stroke:#a60,stroke-width:1px;
    classDef stage fill:#eef,stroke:#225,stroke-width:1.5px,rx:6,ry:6;
    classDef out fill:#e9f6ec,stroke:#266,stroke-width:1px;
    classDef log fill:#f6efe2,stroke:#553,stroke-width:1px,stroke-dasharray:4 2;
    class I1,I2,I3,I4,I5,I6,I7,I8,I9 inp;
    class P1,P2,P3,P4 stage;
    class O1,O2,O3,O4,O5,O6,O7,O8,O9 out;
    class LOG log;

Render pipeline target: publish-figures/artefact_e_pipeline_run.svg (rendered via the thesis Mermaid/SVG pipeline). Counts and labels above are taken verbatim from §9.5.4 prose and from the chapter’s run-log records; do not alter without re-deriving from those sources.

9.24 Spot-checks: ENT-FA-01 and KIT-FA-01

Two spot-checks exercise the predicate shapes across distinct module classes. The packet emitted for ENT-FA-01, the Fully Accessible Entry Module, carries nine structured predicates lifted from the entry’s sda_provenance field (among them Accessway IS_STEP_FREE Required, Accessway HAS_MIN_WIDTH 1200mm, Doorway HAS_CLEAR_WIDTH_MIN 900mm, and Door_Handle COMPLIES_WITH AS1428.1) with provenance domains preserved verbatim in the table’s domain column. Its interface-obligations section detects two obligations, ENT–EXT interface adjacency declared and ENT–CIR interface adjacency declared, corresponding to the entry’s external-approach boundary and its internal-circulation boundary. The validation report records a pass with no errors and no warnings: the eight mandatory frontmatter fields are present, the module_type is in the closed nine-code set, the design_category is FA, and the sda_provenance is a non-empty list of strings.

The packet for KIT-FA-01, the Fully Accessible Kitchen Module, carries ten structured predicates (including Kitchen HAS_BENCH_CLEARANCE 1550 mm, Kitchen HAS_TASK_LIGHTING 300 lux min, Cupboard_Storage HAS_ACCESSIBLE_REACH Required, and Kitchen CONTAINS Sink) and detects two interface obligations, LIV–KIT interface adjacency declared and CIR–KIT interface adjacency declared. Its Unparsed subsection lists three provenance lines as expected: the multi-object Kitchen SHALL_HAVE Cooktop, Rangehood, Oven, Sink, Dishwasher inventory, the Dwelling SHALL_HAVE Kitchen cross-aggregate predicate carried in the kitchen entry to support DWL composition, and the inferred-tap honesty note flagging the cross-room consistency inheritance from Hand_Wash_Basin and Laundry_Tap. All three are simultaneously preserved verbatim in the SDA Constraints section as the audit-of-record. Together the two spot-checks exercise the em-dash, prose, and inventory provenance shapes, and they demonstrate that the predicate parser, the interface-obligation detector, the Unparsed-line discipline, and the validation reporter all function as specified across two distinct module classes.

9.25 Minimum DSR Depth, Reproducibility, and the Procedural Round-Trip

Chapter 9: Generating Documented Dwelling Variants

9.1 Introduction
9.2 The Documentation Bottleneck
9.6 The Procedural-Generation Pipeline: Four-Stage Architecture
9.9 Pipeline Design Decisions and Independent Testability
9.12 The Pre-Vetted Library: Definition and Distinction
9.15 Theoretical Warrant: Three Converging Literatures
9.16 Population, Curation, and Future Work
9.19 Prototype: Runtime and the Documentation-Packet Contract
9.22 Predicate Parsing, the Unparsed-Line Discipline, and the End-to-End Run
9.25 Minimum DSR Depth, Reproducibility, and the Procedural Round-Trip (this section)
9.27 Conclusion and Handoff to Chapter 10

9.25 Why this is the minimum DSR instantiation depth

The prototype is positioned as the minimum Design-Science-Research instantiation depth at which the generator’s contribution claim is testable, and three commitments fix that depth. First, the prototype is a working command-line tool, not a paper specification: the four-stage transformation runs on real input and produces real output, and the run-log totals are independently verifiable by anyone who re-executes the pipeline. Second, the input corpus is the full FA-category baseline-library, nine of the twenty-five minimum-library entries, rather than a token subset, the FA-first scoping reflecting the tiered-population decision and the FA category being the most regulated, most constraint-dense category of the SDA-aligned set. Third, the emitted packets are compliance-bearing in the chapter’s specific sense: each republishes the entry’s regulatory commitments through the schema validator, lifts the entry’s sda_provenance triples through the PlaniSyn-aware transformer, and back-points to the source entry through the absolute source_path field, so that every claim in the packet is traceable to the substrate the Section 9.6 contract requires. The instantiation closes the Section 9.2 documentation-bottleneck argument by demonstrating the feasibility of a thin executable layer over a thick specification stack: the bottleneck can be removed for the FA-category corpus, under the present runtime stack, with the present transformation grammar.

The prototype is, equally explicitly, not a production system. It presents no graphical user interface; it does not integrate with BIM authoring environments through open-standard exchange formats; it incorporates no large-language-model-assisted authoring above the deterministic core; it does not handle the RB, LV, or UN module categories; and it implements only the minimum-viable PlaniSyn EBNF subset sufficient to consume the FA corpus.³² These are proper objects of subsequent inquiry, and the conclusion records each as an explicit scope-limit declaration. The prototype’s value is precisely that it is the smallest executable artefact that admits the contribution claim under the substrate stack the suite supplies, and no smaller than that. Minimality is the demonstration’s discipline; what it omits, it omits by declaration.

9.26 Reproducibility and the procedural round-trip

Every emitted packet carries an emitted_at field in ISO 8601 UTC form and a source_path field giving the absolute path of the input Markdown file, both written into the packet’s YAML frontmatter at emission time. The per-run JSON log records the resolved input, output, and inline-source paths, the expected and processed entry counts, the validation-passed and validation-failed totals, the pipeline_errors list, and a per-entry result block carrying that entry’s predicate count, Unparsed-provenance count, interface-obligations count, bytes written, and packet path. A reader with the prototype directory and the baseline-library appendix can re-execute the pipeline and obtain a packet set whose content is bit-identical to the recorded run modulo the emitted_at timestamp, which by design varies between runs. Reproducibility is therefore not a posture but a consequence of the runtime architecture: pure-Python execution, no network access, no large-language-model calls, no random initialisation, and no implementation-defined ordering mean every transformation step is determined by the input and the source code.

The relationship between the prototype’s reproducibility and Proposition 3 (manipulability) is twofold. The static round-trip-fidelity evidence under Proposition 3 is established at Chapter 7, Section 7.20 (EVID-P3-REPLAY and EVID-P3-INVARIANTS), where the PlaniSyn-encoded packet representation is shown to admit lossless replay against the RecPol formal core. The prototype additionally executes a procedural-level round-trip test: the test suite runs the full state-specification → packet → re-parsed-specification cycle across the nine FA-category entries and asserts specification-equivalence on the load-bearing fields: entry_id, module_type, design_category, constraint_status, predicates, interface obligations, SDA constraints, schema version, and binding_for.³³ Zero divergence is observed across the nine round-trips; predicate count, identity-bearing fields, and interface obligations are conserved across the cycle. The two evidence layers compose: Chapter 7 supplies notation-level round-trip evidence; Chapter 9 supplies prototype-level round-trip evidence; together they discharge the round-trip-invariant evaluation question, EQ-03, at the methodological level pre-registered in Chapter 4, Section 4.5.

What the deterministic substrate establishes is the operational warrant for non-specialist manipulability (pure-Python execution, no network, no large-language model, deterministic ordering, a full per-run log) under which an independent party with the prototype directory and the baseline-library appendix can re-execute the round-trip and confirm the zero-divergence result. The warrant is not the same as a demonstration. Full third-party usability evaluation against a non-specialist user population, in which competent non-specialists attempt packet manipulation and re-emission against the pipeline’s deterministic substrate, is declared future work in Section 9.27 and lies outside the prototype’s evaluative scope; the prototype is operated by the candidate. The provenance discipline composes with the Section 9.12 pre-vetted-library commitment: each entry’s authoring trail is recorded in the entry, and each packet’s generation trail in the packet’s frontmatter and the run-log, so that an examiner who follows the chain from a packet to its source entry to its substrate citations encounters a continuous record from the SDA Design Standard clauses (or the FA typology graph edges), through the human-authored entry, through the schema validator and the predicate transformer, to the emitted packet. No step in the chain is anonymous, and no step is silently inferred.

9.27 Conclusion and Handoff to Chapter 10

Chapter 9: Generating Documented Dwelling Variants

9.1 Introduction
9.2 The Documentation Bottleneck
9.6 The Procedural-Generation Pipeline: Four-Stage Architecture
9.9 Pipeline Design Decisions and Independent Testability
9.12 The Pre-Vetted Library: Definition and Distinction
9.15 Theoretical Warrant: Three Converging Literatures
9.16 Population, Curation, and Future Work
9.19 Prototype: Runtime and the Documentation-Packet Contract
9.22 Predicate Parsing, the Unparsed-Line Discipline, and the End-to-End Run
9.25 Minimum DSR Depth, Reproducibility, and the Procedural Round-Trip
9.27 Conclusion and Handoff to Chapter 10 (this section)

The generator is a procedural artefact whose primary contribution is not a static representation but a deterministic transformation pipeline binding the suite’s other artefacts at runtime. Where the standardisation schema delivers a queryable standardisation schema for regulatory commitments,³⁴ the Governed Kernel Architecture a governed-kernel planimetric-module library,³⁵ and the notation a two-layer formal grammar,³⁶ the generator supplies the executable bridge that ingests the baseline-library entries and emits compliance-bearing documentation packets through the PlaniSyn transformation grammar. It is realised as a deterministic command-line Python prototype operating against nine FA-category baseline-library entries, the minimum corpus consistent with Design-Science-Research instantiation depth at this stage of the programme, and at that depth the pipeline functions end-to-end on every entry in the FA scope, supplying the demonstration substrate that Chapter 10 exercises against representative dwellings.

The chapter advances four arguments that together constitute the artefact’s case. The bottleneck argument, developed in Section 9.2, diagnoses the documentation bottleneck that arises when compliance evidence must be re-authored, by hand, for every project that touches the regulatory substrate. The pipeline argument, developed in Section 9.6, specifies a four-stage ingest → validate → transform → emit pipeline whose validation gate is the RecPol formal core, whose transformation engine is the PlaniSyn applied grammar, and whose outputs are typed documentation packets ready for downstream consumption. The substrate argument, developed in Section 9.12, demands the pre-vetted library as the curated stratum on which the pipeline operates, so that the cost of rule-checking is paid once, at library curation, rather than re-paid on every project. The prototype argument, developed in Section 9.19, operationalises the transformation against that library by reporting an executable instance that runs deterministically across the nine FA entries.

The contribution is best read against the four critical modularity properties the notation chapter identifies as the criteria a representational substrate must satisfy under round-trip verification.³⁷ The generator’s contribution to interoperability (Proposition 1) is direct: documentation packets carry interface contracts, expressed in the PlaniSyn applied layer, that other artefacts in the suite (and, in principle, third-party consumers) can ingest without renegotiating semantics at the boundary. Its contribution to transportability (Proposition 2) is the serialised packet format, which survives the phase transitions between authoring, validation, review, and archival because the canonical orthography is fixed at the notation layer rather than re-invented at each stage. Its contribution to manipulability (Proposition 3) is operational rather than evidential: the deterministic pipeline establishes the operational warrant for non-specialist manipulability (pure-Python execution, no network, no large-language model, deterministic ordering, a full per-run log) under which the round-trip evidence of Chapter 7, Section 7.20 applies at the procedural-instantiation level through the prototype’s reproducibility; full third-party usability evaluation against a non-specialist user population is declared future work in Section 9.27. Its contribution to transformability (Proposition 4) is the PlaniSyn transformation grammar, which encodes the pathways by which the substrate adapts under regulatory change: when the underlying standard moves, the grammar, not the practitioner’s manuscript, absorbs the delta. The generator therefore does not introduce the four properties; it operationalises them at runtime, converting commitments held statically by the Governed Kernel Architecture and the notation into a working flow.

The contribution is bounded in three respects we have stated explicitly. First, the prototype operates on the nine FA-category entries; the RB, LV, and UN module categories are scope-limited to on-demand authoring and remain a declared scope-limit rather than a defect of the pipeline.³⁸ Second, the transformation grammar is the PlaniSyn sealed substrate at the version under which the prototype is built; any future amendment to the grammar would require a re-validation of round-trip fidelity at the notation chapter before that amended grammar could be admitted to the pipeline. Third, the prototype is a command-line research instrument, not a production-grade authoring system: integration with practitioner toolchains, multi-user concurrency, and graphical-interface authoring are all declared future work. Together these frame the artefact as a minimum-viable demonstration of the pipeline’s logic rather than a turnkey deliverable for industry adoption.

The handoff to Chapter 10 proceeds under explicit terms. The two demonstration cases (Case 1, a typical detached dwelling, and Case 2, an SDA-compliant dwelling) exercise the generator against the synthetic-dwelling specifications constructed for the ecological-grounding work.³⁹ The Chapter 10 evaluation tests the generator’s contribution along three measured axes: standards interpretability, measure EM-09-01, as the application of Proposition 1; modular fit, measure EM-09-02, as the application of Proposition 2; and workflow efficiency, measure EM-09-03, as the practitioner-facing measure of the burden side of the fifth research question. The handoff is not a re-statement of the artefact’s claims; it is a controlled transfer of the prototype’s outputs into the evaluation harness, where those outputs are compared against the empirical baselines supplied by the empirical substrate.⁴⁰

The generator’s position within the suite can therefore be stated cleanly. The standardisation schema supplies the regulatory commitments any procedural workflow must respect; the Governed Kernel Architecture supplies the substrate of typed modules whose interface contracts the workflow consumes; the notation supplies the formal language in which those commitments and contracts are expressed without information loss; and the empirical substrate supplies the empirical evidence against which the workflow’s outputs are validated. The generator is the procedural artefact that binds the four together at runtime: the suite’s executable instance, in which the otherwise-static commitments of the other artefacts are exercised, audited, and rendered into documentation packets that downstream consumers can act upon. Without the generator the suite remains a set of well-formed but unintegrated representations; with it, the suite admits a tractable end-to-end demonstration. The handoff contracts that govern the boundaries between artefacts are recorded separately so that the integration is auditable rather than implicit.⁴¹

Six directions of future work follow from the scope-limits and the architecture rather than from any deficiency in the present demonstration, and each is bounded by an explicit scope-limit declaration so that an examiner can scrutinise it rather than infer it.

Cross-corpus generalisation. The most direct extension carries the pipeline beyond the FA category and beyond the synthetic-dwelling corpus to multi-jurisdictional residential standards. The architectural prediction it would test is specific: because the pipeline separates the standardisation schema from the transformation grammar, jurisdictional variance ought to be absorbable at the schema layer, leaving the grammar and the pipeline unchanged. A second jurisdiction’s accessibility provisions, serialised into a sibling schema and authored into a sibling pre-vetted library, would either confirm that prediction, the same grammar transforms the new entries without amendment, or localise the point at which jurisdictional difference forces a grammar change, which is itself a finding about where the suite’s stratification holds.

Productionisation. The present prototype is a research instrument; converting it into a maintainable authoring tool would require persistent storage, multi-user governance over concurrent library curation, and version-controlled management of the entry bundle and the packet schema. The work is engineering rather than design, but it is the precondition for an industry trial: a maintainability layer is what lets a practitioner population, rather than the candidate, exercise the pipeline against live projects, and only at that scale do the workflow-efficiency claims of Chapter 10 acquire an external evidence base.

Integration with BIM authoring tools. The documentation-packet schema specifies open-standard exchange formats, and a future direction would bind the pre-vetted library into established BIM design workflows through them. The question it would settle is whether the library functions as a reusable component embedded within practitioner toolchains rather than as a free-standing apparatus: whether, that is, the governed-component discipline of Section 9.12 survives contact with the parametric-component conventions of the BIM environments from which Section 9.14 distinguished it.

Automated rule-checker integration. A fourth direction would close the loop between the documentation packets the generator emits and the rule-evaluation engines that consume them, along the Hjelseth and Nisbet pattern.⁴² The emitted packet already carries the provenance ledger and the lifted PlaniSyn predicates a rule engine would query; wiring that output into an automated checker would let the pipeline’s emission and the rule-checking literature’s verdict-issuance, the two partial responses that Section 9.2 surveyed as distinct families, operate as a single governed cycle rather than as adjacent tools.

Large-language-model-assisted authoring layers. A fifth direction would add assistive authoring above the deterministic pipeline, offering practitioners a natural-language interface to library curation while preserving the deterministic guarantees of the underlying transformation grammar. The design constraint is the one Section 9.2 drew from the unconstrained-generation failure mode: any generative layer must sit above the governed core, proposing entries that the validator then admits or rejects, so that the assistance reduces authoring cost without relocating the traceability guarantee into the prompt.

Third-party usability evaluation. The sixth direction, indexed here as Section 9.27 for explicit cross-reference from Section 9.1 and Section 9.26, is a structured usability evaluation against a non-specialist user population. The operational form of the manipulability claim under Proposition 3 (that competent non-specialists can manipulate a documentation packet, re-emit it through the pipeline, and recover a representation whose semantic and verification commitments remain consistent) is not exercised by the present prototype, which is operated by the candidate. The deterministic substrate establishes the operational warrant for that claim; a study in which non-specialist participants attempt packet manipulation and re-emission against the pipeline is what would be required to elevate it to a demonstrated capability. It is the one future direction that bears directly on a thesis-level proposition rather than on the artefact’s reach, and for that reason the chapter declares it explicitly rather than absorbing it into the general productionisation agenda.

The generator stands in place, then, as the suite’s executable bridge: narrow at its present FA-corpus depth, but wide in the architectural surface it exposes. Whether the procedural artefact is adequate as a runtime binding for the suite is the proper subject of Chapter 10, whose demonstration cases supply the evidential weight that the present chapter, by design, defers.

Notes

Chapter 5; see also Supplementary RecPol Specification and Supplementary Author Research Corpus Assertions. ↩︎
Chapter 6; governed instance library specification in Section 6.4. ↩︎
Chapter 7. ↩︎
Chapter 8; corpus profile in Section 8.10. ↩︎
The four-property framing (Property 1 interoperability, Property 2 transportability, Property 3 manipulability, Property 4 transformability) is established in Chapter 1, Section 1.4; the round-trip-fidelity evidence under Proposition 3 is established at Chapter 7, Section 7.20 (EVID-P3-REPLAY and EVID-P3-INVARIANTS); Chapter 9 contributes the procedural-instantiation operationalisation rather than re-establishing the evidence. ↩︎
J. Togelius, G. N. Yannakakis, K. O. Stanley, and C. Browne, “Search-Based Procedural Content Generation: A Taxonomy and Survey,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 3, no. 3, pp. 172-186, 2011; G. Smith and J. Whitehead, “Analyzing the expressive range of a level generator,” in Proceedings of the 2010 Workshop on Procedural Content Generation in Games (PCGames ’10), ACM, 2010; K. Compton, Casual Creators: Defining a Genre of Autotelic Creativity Support Systems, Doctoral dissertation, University of California, Santa Cruz, 2019. ↩︎
G. Stiny, “Introduction to shape and shape grammars,” Environment and Planning B: Planning and Design, vol. 7, no. 3, pp. 343-351, 1980, doi: 10.1068/b070343; T. W. Knight, “Shape grammars: Six types,” Environment and Planning B: Planning and Design, vol. 26, no. 1, pp. 15-31, 1999, doi: 10.1068/b260015; J. P. Duarte, “Towards the mass customization of housing: the grammar of Siza’s houses at Malagueira,” Environment and Planning B: Planning and Design, vol. 32, no. 3, pp. 347-380, 2005, doi: 10.1068/b31124, exhibits an applied shape-grammar that emits compliance-bearing housing variants under explicit, traceable rule applications: the closest precedent in the literature to the present combination of declared-grammar generation and compliance-bearing emission. ↩︎
Corpus profile and screening protocol in Chapter 8, Section 8.10; the 745-plan figure is the validated post-screening corpus size after the four-stage extraction-validation pipeline reported there. ↩︎
The agent-manual extraction protocol, including its rationale and the prohibition on programmatic end-to-end automation for the floor-plan extraction stage, is the canonical extraction method established for the Chapter 8 empirical substrate; the extraction-effort estimate is the working figure governing that substrate’s production budget. The corpus is a complete census of the screened plans rather than a probability sample, so the figures reported here are descriptive counts over that census. ↩︎
The polysemy among SDA-domain terms, on the order of three in five eligible terms, is established empirically in Chapter 5 and grounded theoretically in J. Pustejovsky, The Generative Lexicon, MIT Press, 1995. ↩︎
C. M. Eastman, P. Teicholz, R. Sacks, and K. Liston, BIM Handbook: A Guide to Building Information Modeling for Owners, Designers, Engineers, Contractors, and Facility Managers, 3rd ed., Wiley, 2018. ↩︎
ISO 19650-1:2018, Organization and digitization of information about buildings and civil engineering works, including BIM — Part 1: Concepts and principles, ISO, 2018. ↩︎
C. Eastman, J. M. Lee, Y. S. Jeong, and J. K. Lee, “Automatic rule-based checking of building designs,” Automation in Construction, vol. 18, no. 8, pp. 1011-1033, 2009; W. Solihin and C. Eastman, “Classification of rules for automated BIM rule checking development,” Automation in Construction, vol. 53, pp. 69-82, 2015; E. Hjelseth, “Foundations for BIM-based model checking systems,” Doctoral dissertation, Norwegian University of Life Sciences, 2015, and E. Hjelseth and N. Nisbet, “Capturing normative constraints by use of the semantic mark-up RASE methodology,” in Proceedings of CIB W78-W102 2011, 2011. ↩︎
J. Dimyadi and R. Amor, “Automated building code compliance checking. Where is it at?” in Proceedings of the 19th CIB World Building Congress, 2013. ↩︎
The foundational capability claims this family inherits are established by T. B. Brown et al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020, arXiv:2005.14165; R. Bommasani et al., “On the opportunities and risks of foundation models,” Stanford Center for Research on Foundation Models, 2021, arXiv:2108.07258, which explicitly catalogues regulatory and compliance-adjacent documentation among the deployment domains of non-trivial risk; and OpenAI, “GPT-4 technical report,” 2023, arXiv:2303.08774. The argument depends not on any single application paper but on the architectural pattern they collectively share: generative emission without an enforced governance contract over the substrate from which the emission is drawn. ↩︎
The entry corpus is described in publish-thesis/publish-data/appendix-data-ch6-baseline-library/idx_appendix.md; the schema is specified in Chapter 6, Section 6.4. ↩︎
The PlaniSyn applied grammar and the underlying RecPol formal core are specified jointly in Chapter 7; see particularly Chapter 7, Section 7.13 for the three-layer architecture, the seven-tag grammar, and the four interaction types whose syntactic forms the validator references. ↩︎
The current state of SDA provenance integration is documented in publish-thesis/publish-data/appendix-data-ch6-baseline-library/idx_appendix.md Section 5; numbered SDA clause identifiers are surfaced for SAN-FA-01 (inline at Chapter 6, Section 6.4) but not yet for the eight bundled FA entries, which cite at the FA typology edge level pending a subsequent audit pass. ↩︎
C. Y. Baldwin and K. B. Clark, Design Rules, Volume 1: The Power of Modularity (Cambridge, MA: MIT Press, 2000), Ch. 3 (“What is Modularity?”) and Ch. 5 (“Design Rules: The Six Modular Operators”). Baldwin and Clark’s distinction between visible design rules (which coordinate independent work) and hidden module parameters (which can vary locally) is the structural precedent for positioning the pipeline as a hidden-parameter implementation conforming to the visible design rules supplied by the standardisation schema, the Governed Kernel Architecture, and the notation; the engine those rules govern is developed in Chapter 6. ↩︎
The methodological asymmetry between SAN-FA-01 (which cites numbered SDA clauses D4.1, D4.2, D4.3, D4.6, D4.8) and the eight bundled FA entries (which cite at the FA typology edge level pending a subsequent audit pass) is documented in publish-thesis/publish-data/appendix-data-ch6-baseline-library/idx_appendix.md Section 5. Each entry’s provenance trail and abstract callout repeat the honesty note for standalone reading. ↩︎
Chuck Eastman, Paul Teicholz, Rafael Sacks, and Ghang Lee, BIM Handbook: A Guide to Building Information Modeling for Owners, Designers, Engineers, Contractors, and Facility Managers, 3rd ed. (Hoboken, NJ: Wiley, 2018), Ch. 1 (“BIM Handbook Introduction”) and Ch. 4 (“BIM Design Tools and Parametric Modeling”). Eastman and colleagues describe the parametric-component library as the operational vehicle for component reuse across BIM projects, and they document the recurrent pattern in which a component’s parameters are insufficient to encode regulatory commitments, so the compliance check is deferred to a downstream rule-engine pass rather than carried within the component itself. ↩︎
N. J. Habraken, Supports: An Alternative to Mass Housing. London, UK: Routledge, 2021 (reissue; originally published 1972), ISBN: 9780367857387; N. J. Habraken, The Structure of the Ordinary: Form and Control in the Built Environment (Cambridge, MA: MIT Press, 1998). Habraken’s foundational distinction between Support (long-life capacity) and Infill (short-life configuration) supplies the structural precedent for treating the library as a capacity layer on which a population of design instances is configured. ↩︎
S. H. Kendall, “Open Building: An Abbreviated History and a Look Forward,” Open House International, ahead-of-print (2025), doi: 10.1108/OHI-05-2025-0185. Kendall’s contemporary survey identifies the standardised-interface discipline as the operational mechanism through which Open Building’s capacity layer makes downstream configuration feasible. ↩︎
C. Y. Baldwin and K. B. Clark, Design Rules, Volume 1: The Power of Modularity (Cambridge, MA: MIT Press, 2000), Ch. 3 (“What is Modularity?”) and Ch. 5 (“Design Rules: The Six Modular Operators”). Baldwin and Clark’s distinction between visible design rules (which coordinate independent work across module boundaries) and hidden module parameters (which can vary locally without destabilising the system) supplies the theoretical structure within which the library’s entries function as platform-conformant complements. ↩︎
Chuck Eastman, Paul Teicholz, Rafael Sacks, and Ghang Lee, BIM Handbook, 3rd ed. (Hoboken, NJ: Wiley, 2018), Ch. 7 (“BIM and Building Codes and Regulations”) and Ch. 9 (“BIM for Sustainability and Building Performance Analysis”). The chapters survey the design and operational characteristics of the standards-formalisation programmes that have populated the BIM ecosystem since the early 2000s and identify the pre-vetted entry pool as a recurrent precondition for schema-driven generation. ↩︎
J. Dimyadi and R. Amor, “Automated Building Code Compliance Checking — Where is It At?,” in Proceedings of the 19th CIB World Building Congress, Brisbane 2013, eds. S. Kajewski, K. Manley, and K. Hampson (Brisbane: Queensland University of Technology, 2013), 172-185. Dimyadi and Amor’s survey establishes that automated compliance checking depends on a curated, schema-conformant entry pool; the pre-vetted library is our instantiation of that precondition for the SDA-aligned dwelling-design domain. ↩︎
Per-entry line counts and provenance citation tallies are recorded in publish-thesis/publish-data/appendix-data-ch6-baseline-library/idxappendix.md Section 2 (“File Manifest”); the manifest records eight of eight entries with constraintstatus: true substantiation and nine of nine module types covered at the FA category. The counts reported here are descriptive over the populated set, which is a complete census of the FA-category entries authored to date rather than a sample. ↩︎
The reproduction protocol is recorded in publish-thesis/publish-data/appendix-data-ch6-baseline-library/idx_appendix.md Section 4 (“Reproduction Notes”). The eight bundled entries were authored against a fixed substrate of six input documents: the Chapter 6 module library specification, the constituent elements specification, the module taxonomy, the interaction rules, and two FA typology graph files. Honest provenance gaps are surfaced in each entry rather than fabricated as clause identifiers not present in the source. ↩︎
C. Y. Baldwin and K. B. Clark, Design Rules, Volume 1: The Power of Modularity (Cambridge, MA: MIT Press, 2000), Ch. 3 (“What is Modularity?”) and Ch. 5 (“Design Rules: The Six Modular Operators”). The minimum-dependency runtime is the prototype’s instantiation of Baldwin and Clark’s hidden-parameter discipline: the implementation may evolve freely beneath the pipeline’s stage boundaries provided the visible design rules, the inter-stage contracts of Section 9.6 and the documentation-packet schema specified below, remain stable. ↩︎
The PlaniSyn production rules are specified jointly in Chapter 7; see in particular Chapter 7, Section 7.13 for the seven-tag grammar and the four interaction types whose syntactic forms the transformer references. The full EBNF parser is deferred to subsequent work; the prototype implements the minimum-viable subset sufficient to consume baseline-library sda_provenance fields and emit structured predicates. ↩︎
The interface-obligations detector was extended per the procedural-evidence work items AGENT-EXT-8 and R2-Ch9-SW-02. ↩︎
Each scope-limit is recorded as a declared future-work direction in Section 9.27 and traced to the requirements register at Supplementary Requirements-Design-Evaluation Traceability Matrix. The scope-limits are properties of the declared instantiation-depth scope rather than defects of the prototype’s logic. ↩︎
The procedural-level round-trip test is implemented at tests/test_roundtrip.py, added per the procedural-evidence work items AGENT-EXT-7 and R2-Ch9-SW-01. ↩︎
Chapter 5: A Queryable Schema for Accessibility Standards. ↩︎
Chapter 6: A Governed Kernel Architecture for Housing. ↩︎
Chapter 7: A Formal Notation for Floor Plans. ↩︎
Chapter 7: A Formal Notation for Floor Plans; the four properties are interoperability, transportability, manipulability, and transformability. ↩︎
Supplementary Requirements-Design-Evaluation Traceability Matrix records the FA-only coverage decision and traces it to the requirements register. ↩︎
Supplementary Environmental Grounding Dossier records the synthetic-dwelling specifications and the sealed corpus reference. ↩︎
Chapter 8: Evidence from a Census of Australian Floor Plans. ↩︎
Supplementary Artefact Suite Handoff Contracts. ↩︎
E. Hjelseth and N. Nisbet, “Capturing normative constraints by use of the semantic mark-up RASE methodology,” in Proceedings of CIB W78-W102 2011, 2011. ↩︎