How to change the schema¶
How-to — evolve the XSD and propagate the change through the schema-driven pipeline.
The schema is the single source of truth, so changing it is a configuration change, not a redesign: models, validation, bindings and the schema docs all regenerate from it.
Steps¶
-
Edit the schema. Change
schema/acoustic_dataset.xsd(the contract). When you add or rename a type/element, put its definition prose inxs:annotation/xs:documentationso it rides through to the generated model docstrings and the schema reference. -
Regenerate the models.
Runsmake generatexsdataover the schema and rewritessrc/acoustic_dataset/models/. Do not hand-edit the result — it's a generated artifact (ADR 0008). Generation is pinned to the 3.9 toolchain so the output is byte-reproducible for the drift gate. -
Regenerate the schema docs + ERD.
Produces the reference pages and the Mermaid ERD from the schema (ADR 0009).make gen-schema-docs -
Update the builder.
src/acoustic_dataset/build.pyis the one place that knows element names — update it for any added/renamed/retyped fields. Generation code does not change. -
Update the example and golden file. Adjust
examples/calculation_input.json, then refresh the golden file if the new output is intended:Review the golden diff deliberately — it's the semantic gate.make pipeline # writes build/acoustic_dataset.xml cp build/acoustic_dataset.xml tests/golden/acoustic_dataset.xml -
Re-run the gates.
make verify # lint + type-check + tests + drift gate make pipeline # end-to-end: map -> serialise -> validate -> round-trip
Guard against schema-valid-but-different¶
If you have a known-good file from a prior process (e.g. one of the consumer's trial
files), drop it in examples/reference/ and compare:
make pipeline
python -m acoustic_dataset.cli compare build/acoustic_dataset.xml examples/reference/<file>.xml
A clean match exits 0; a meaningful difference prints a diff and exits non-zero — catching output that is schema-valid but differs from what a consumer depends on (ADR 0004).
What you should not touch¶
- Generated models (
src/acoustic_dataset/models/) — regenerate instead. - Generated schema reference/ERD under
reference/schema/— regenerate instead. - The generation code — it's schema-agnostic by design.