How to change the schema¶
How-to — evolve the XSD and propagate the change through the schema-driven pipeline.
The schema is the single source of truth, so changing it is a configuration change, not a redesign: models, validation, bindings and the schema docs all regenerate from it.
There are two schemas, regenerated together by make generate: schema/acoustic_dataset.xsd
(the output dataset) and schema/calculation_input.xsd (the calculation parameters). They are
different shapes — the builder expands the input parameters into the output dataset — so an
output change usually means touching the output schema, the builder, and possibly the input
schema if a new parameter is needed.
Steps¶
-
Edit the schema(s). Change
schema/acoustic_dataset.xsdand/orschema/calculation_input.xsd(the contracts). When you add or rename a type/element, put its definition prose inxs:annotation/xs:documentationso it rides through to the generated model docstrings and the schema reference. -
Regenerate the models.
Runsmake generatexsdataover both schemas and rewritessrc/acoustic_dataset/models/(output) andsrc/acoustic_dataset/input_models/(input). Do not hand-edit the result — it's a generated artifact (ADR 0008). Generation is pinned to the 3.9 toolchain so the output is byte-reproducible for the drift gate. -
Regenerate the schema reference.
Produces the HTML reference from the schema via xs3p (ADR 0011).make gen-schema-docs -
Update the builder.
src/acoustic_dataset/build.pyis the one place that knows element names — update it for any added/renamed/retyped fields. Generation code does not change. -
Update the example and golden file. Adjust
examples/calculation_input.xml(it must stay valid againstschema/calculation_input.xsd), then refresh the golden file if the new output is intended:Review the golden diff deliberately — it's the semantic gate.make pipeline # writes build/acoustic_dataset.xml cp build/acoustic_dataset.xml tests/golden/acoustic_dataset.xml -
Re-run the gates.
make verify # lint + type-check + tests + drift gate make pipeline # end-to-end: map -> serialise -> validate -> round-trip
Guard against schema-valid-but-different¶
If you have a known-good file from a prior process (e.g. one of the consumer's trial
files), drop it in examples/reference/ and compare:
make pipeline
python -m acoustic_dataset.cli compare build/acoustic_dataset.xml examples/reference/<file>.xml
A clean match exits 0; a meaningful difference prints a diff and exits non-zero — catching output that is schema-valid but differs from what a consumer depends on (ADR 0004).
What you should not touch¶
- Generated models (
src/acoustic_dataset/models/,src/acoustic_dataset/input_models/) — regenerate instead. - Generated HTML schema reference under
reference/schema/— regenerate instead. - The generation code — it's schema-agnostic by design.