A walk through publish_html.py, the trainee DITAVAL, and the single-stylesheet theme
~15 min · every step is reproducible by hand on the air-gapped target
One DITA source under dita/ → two HTML editions under html/. What the air-gapped author actually runs vs. what the dev pipeline adds on top:
theme.css +gramframe.bundle.jsThat's the whole air-gapped recipe. The CSS classifies pages itself via :has(); no body tagging, no landing page, no prettify pass.
publish_html.py).dita-build/html/index.html.html for diffs<meta><script>theme.css<link>One gram topic. The annotations in red are the hooks every later step keys on.
<topic id="gram_01">
<title>Gram 01<ph audience="-trainee" outputclass="vessel-name"> - FR Outrider, Category 4, Tantive</ph></title>
<body>
<section audience="-trainee" outputclass="analysis-sheet">
<title>Analysis Sheet</title>
<p><xref href="analysis-sheet.docx" format="docx" scope="local">Analysis Sheet</xref></p>
</section>
<section outputclass="lofar-stage">
<title>Lofar 1</title>
<table outputclass="gram-config">
<tgroup cols="2">
<colspec colname="c1" colnum="1"/><colspec colname="c2" colnum="2"/>
<tbody>
<row><entry namest="c1" nameend="c2"><image href="lofar-1-i.png" placement="break" align="center"/></entry></row>
<row><entry>time-start</entry><entry>0</entry></row>
<row><entry>time-end</entry><entry>300</entry></row>
...
</tbody>
</tgroup>
</table>
</section>
</body>
</topic>
class="..."; that is the only hook the theme needs.These are the only places we attach audience="-trainee". Together they carry everything the student must not see.
| # | Site | DITA shape | Where it lives |
|---|---|---|---|
| 1 | Vessel-name decoration on a gram title | <title>Gram NN<ph audience="-trainee"> — vessel</ph></title> |
every gram_NN.dita |
| 2 | Analysis Sheet section | <section audience="-trainee"><title>Analysis Sheet</title>…</section> |
every gram_NN.dita |
| 3 | "Instructor " prefix on a chapter navtitle | <navtitle><ph audience="-trainee">Instructor </ph>Week 1 Grams</navtitle> |
main.ditamap only (fires when CSV chapter starts "Instructor ") |
| 4 | "— Instructor Version" suffix on a map title | <title>Progress Test 1<ph audience="-trainee"> — Instructor Version</ph></title> |
every ditamap |
Folder and file names never contain the substring "instructor" (case-insensitive). The audience prefix is split off before the slug is computed (_normalise_chapter() in generate_dita.py) so the URL path is identical in both editions — only the visible label differs.
File: dita/trainee.ditaval. Committed alongside the DITA source. publish_html.py refuses to build without it.
<?xml version="1.0" encoding="UTF-8"?>
<val>
<prop att="audience" val="trainee" action="exclude"/>
</val>
DITA-OT tokenises the audience attribute: "-trainee" contains the token trainee, so action="exclude" matches and the element disappears from the build.
What it removes, per site:
| Site | Instructor renders | Student renders |
|---|---|---|
| Gram title | Gram 01 — FR Outrider | Gram 01 |
| Analysis Sheet | full section | (gone) |
| Chapter navtitle | Instructor Week 1 Grams | Week 1 Grams |
| Map title | Progress Test 1 — Instructor Version | Progress Test 1 |
Leakage guarantee: a recursive grep for instructor (any case) over html/student/ — content and paths — must return zero matches.
dita \
--input=.dita-build/main/main.ditamap \
--format=html5 \
--output=html/instructor/main \
--processing-mode=lax
No --filter — the full content is rendered.
dita \
--input=.dita-build/main/main.ditamap \
--format=html5 \
--output=html/student/main \
--processing-mode=lax \
--filter=.dita-build/trainee.ditaval
Only the --filter and --output change.
For each ditamap (1 main + 5 progress tests + 1 final assessment = 7), this loop runs 14 times total. The output trees mirror each other one-for-one (URL parity, FR-016).
stage() step existsDITA-OT can't read the source tree directly. stage() takes a build-only copy of dita/ into .dita-build/ and fixes two things:
.dita / .ditamap files are committed without DOCTYPEs (so Oxygen authors don't trip on DTD lookups). DITA-OT needs them to classify elements, so stage() prepends them to every staged copy.dita/progress-test-5.ditamap referencing progress-test-5/gram-01/... would publish to .../progress-test-5/progress-test-5/gram-01/... — a duplicated segment. stage() moves the map into its own folder and rewrites the href=s to drop the leading <stem>/, so DITA-OT writes the clean tree we want.# Before staging
dita/
├── main.ditamap
├── main/
│ └── pub10-ed22b-updated/gram-01/gram_01.dita
├── progress-test-1.ditamap
└── trainee.ditaval
# After staging
.dita-build/
├── main/main.ditamap # DOCTYPE added, hrefs rewritten
├── main/pub10-ed22b-updated/gram-01/gram_01.dita # DOCTYPE added
├── progress-test-1/progress-test-1.ditamap
└── trainee.ditaval
Source tree is never touched. Re-staging is idempotent (the directory is wiped first).
prettify_tree)DITA-OT emits each topic page on a single long line. Unreadable in view-source, unreadable in a diff. The custom HTML tree-builder re-emits every *.html with:
<a>, <span>, <strong>) left flat — splitting them would change rendered whitespace.<pre> / <script> / <style> preserved verbatim.<meta>, <img>) emitted HTML5-style, no trailing slash.Side benefit: the canonical layout means the regexes in steps 6 and 7 always find </head> at exactly the same indent.
scrub_nondeterministic_metadata)DITA-OT bakes wall-clock timestamps into every page. Two carriers:
<meta name="DC.date.created" content="2026-05-17T09:14:02Z"/>
<meta name="DC.date.modified" content="2026-05-17T09:14:02Z"/>
Both depend on the run wall-clock, not on the source. A single regex strips both from every page. Result: byte-identical output across runs (FR-008 / SC-006), so we can hash-compare two trees and trust the result.
DITA-OT renders every <table outputclass="gram-config"> into HTML as <table class="gram-config">. The vendored GramFrame bundle (vendor/gramframe/gramframe.bundle.js) scans for that class on DOMContentLoaded and rewrites the table into an interactive spectrogram viewer.
All inject_gramframe_plugin has to do is:
html/gramframe.bundle.js.*.html under html/, insert one line before </head>:
<script src="../../../../gramframe.bundle.js" defer></script>
(the relative path is computed per file so deep pages still resolve it)The script is a no-op on pages with no gram-config table, so it's safe to drop on every page in both editions.
The theme step has one essential job: vendor theme.css + a <link> tag into every page. The stylesheet then classifies each page by what DITA-OT already emitted — no Python-set body attributes required.
| Variation | CSS selector | Why it matches |
|---|---|---|
| Ditamap index page | body:has(ul.map) |
ul.map only exists on the per-ditamap index DITA-OT emits |
| Instructor edition | body:has(.ph) |
Every <ph> in the source is audience="-trainee" (chapter prefix, map-title suffix, vessel-name). DITAVAL strips them all from student. |
| Student edition | body:not(:has(.ph)) |
Inverse of the instructor detector |
| Instructor index | body:has(ul.map):has(.ph) |
Compound — both detectors fire |
| Student index | body:has(ul.map):not(:has(.ph)) |
Compound — index but no .ph |
A copy of theme.css is placed at the root and inside each edition folder so every page has a nearby copy to link with a short relative href. The link is only inserted if the page doesn't already have one (idempotent).
Why this matters: the air-gapped target won't run our Python. Oxygen's publish template injects the theme.css link, the audience filter does its work, and the CSS classifies and styles each page without any post-publish step. The body attributes publish_html.py still writes are belt-and-braces — useful for dev-side inspection, irrelevant to the styling.
vendor/themes/operator-console-v2/theme.css (~570 lines). Every variation is driven by combinations of:
outputclass./* Variant by edition: the classification banner.
.ph elements only exist on instructor pages (every in the source is
audience-tagged and DITAVAL strips them all from student). */
body::before {
content: "┄ INSTRUCTOR ┄ TRAINING USE ONLY ┄ CLASS-RESTRICTED ┄";
color: var(--instructor);
...
}
body:not(:has(.ph))::before {
content: "┄ STUDENT ┄ TRAINING ┄";
color: var(--student);
...
}
/* Variant by outputclass: LOFAR stage gets the "trace" treatment.
DITA-OT copies outputclass="lofar-stage" straight through to HTML class. */
section.lofar-stage { background: var(--panel); border: 1px solid var(--rule); ... }
section.lofar-stage > h2.sectiontitle::before { content: "● TRACE / STAGE"; ... }
section.lofar-stage .imagecenter::after { /* CRT scanlines overlay */ ... }
A short :root palette (--bg, --panel, --cyan, --amber, --instructor, --student…) is the only thing to edit if the colour scheme needs to shift. :has() is Baseline 2023 — current Chromium/Firefox/Safari support it.
DITA-OT emits an index page per ditamap with a deeply nested <ul class="map">. By default it is a long single-column list — useless for a 480-gram publication.
body:has(ul.map) reshapes it into a card-and-tile grid with CSS alone:
/* Section card — one per <li class="topichead"> (= chapter heading) */
body:has(ul.map) li.topichead {
background: var(--panel);
border: 1px solid var(--rule);
border-left: 3px solid var(--cyan);
border-radius: 4px;
padding: 16px 18px 18px;
}
/* Colour-code the five chapters by position */
body:has(ul.map) li.topichead:nth-child(1) { border-left-color: var(--amber); color: var(--amber); }
body:has(ul.map) li.topichead:nth-child(2) { border-left-color: #4dd0e1; ... }
body:has(ul.map) li.topichead:nth-child(3) { border-left-color: #66bb6a; ... }
...
/* Inner gram list — dense responsive tile grid */
body:has(ul.map) li.topichead > ul {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(220px, 1fr));
gap: 6px;
}
/* Flat ditamaps (progress tests) have no chapter cards. A nested :has()
detects that shape and switches the outer list itself into a tile grid. */
body:has(ul.map) ul.map:has(> li.topicref:not(.topichead)) {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(220px, 1fr));
gap: 6px;
}
/* Student edition: short "Gram NN" labels -- tighten the grid further.
Compound :has() combines page-type and edition detection in one selector. */
body:has(ul.map):not(:has(.ph)) li.topichead > ul,
body:has(ul.map):not(:has(.ph)) ul.map:has(> li.topicref:not(.topichead)) {
grid-template-columns: repeat(auto-fill, minmax(120px, 1fr));
}
◉ TGT FR Outrider, Category 4…▤ glyphGram 01 — vessel-name pill absent (filtered out)<section> filtered out)The HTML emitted by DITA-OT is genuinely different between the two editions (the filter strips elements). The CSS branching on top of that only handles the chrome (banner colour, tile density) — the structural difference is already there from the audience filter, and the CSS reads it directly via :has(.ph).
The author on the air-gapped target has Oxygen XML Author and nothing else — no Python, no custom DITA-OT plugin. publish_html.py is a dev/CI convenience. The Oxygen template only has to copy theme.css + gramframe.bundle.js into the output and link them from each page (the pub-9 / pub-10 pattern). After that:
| Capability | On the air-gapped target |
|---|---|
| Audience filter (instructor / student split) | ✓ Kept — Oxygen's DITAVAL UI |
| Per-publication standalone output | ✓ Kept — each ditamap publishes to its own folder |
| LOFAR / Analysis Sheet / vessel-name styling | ✓ Kept — DITA outputclass → HTML class |
| Edition banner (instructor amber / student cyan) | ✓ Kept — CSS :has(.ph) detects edition |
| Multi-column gram tile grid (chapter colour-coding) | ✓ Kept — CSS :has(ul.map) detects index pages |
| Per-edition tile density (wide instructor / dense student) | ✓ Kept — compound :has() selectors |
Shared html/index.html landing page | ✗ Lost — hand-write one if needed |
| Per-edition "choose a publication" index | ✗ Lost — per-publication ditamap-index pages still exist (DITA-OT default) |
| Prettified HTML, scrubbed timestamps, byte-deterministic output | ✗ Lost — cosmetic only, no functional impact |
| Automated trainee-leakage verification | ✗ Lost — the guarantee still holds via DITAVAL; run a manual grep -ri instructor …/student/ after each publish |
Net effect: the styling and content guarantees survive the move; what we lose is the curated cross-publication chrome (one landing, two edition indexes) and the dev-side determinism passes. Everything that matters to a reader inside one publication is intact.
The three things that would otherwise break byte-determinism, and what we do about each:
| Source of drift | Mitigation |
|---|---|
DITA-OT timestamp <meta> tags | scrub_nondeterministic_metadata() strips both carriers |
| Landing-page "Generated YYYY-MM-DD HH:MM UTC" | Honours SOURCE_DATE_EPOCH — set it in CI to pin the timestamp |
| Single-line topic HTML (cosmetic, but defeats diff) | prettify_tree() re-emits canonically |
Result: publish_html.py run twice in a row on an unchanged source produces html/ trees that are byte-identical. A hash-of-tree test in tests/test_publish_html.py asserts this.
On the air-gapped target the author uses Oxygen + the DITAVAL filter — nothing below. This recipe is for the dev side: everything in publish_html.py's main() reproduced manually with stdlib Python so any one step can be re-run in isolation if a later step fails:
# 0. Prereqs (all pre-vendored in the repo — no internet needed) ls dita/ # source tree, including trainee.ditaval ls vendor/gramframe/ # gramframe.bundle.js ls vendor/themes/operator-console-v2/ # theme.css ls $DITA_OT/bin/dita # DITA-OT 4.x with bundled JRE # 1. Stage (DOCTYPE injection + ditamap nesting). Use the script — # the hand-rolled equivalent is fiddly enough to not be worth it. python -c "from publish_html import stage; from pathlib import Path; \ stage(Path('dita'), Path('.dita-build'))" # 2. Two DITA-OT runs per ditamap. Loop in your shell: for map in .dita-build/*/*.ditamap; do stem=$(basename "$map" .ditamap) $DITA_OT/bin/dita --input="$map" --format=html5 \ --output="html/instructor/$stem" --processing-mode=lax $DITA_OT/bin/dita --input="$map" --format=html5 \ --output="html/student/$stem" --processing-mode=lax \ --filter=.dita-build/trainee.ditaval done # 3. Landing + per-edition indexes, 4. prettify, 5. scrub, # 6. inject GramFrame, 7. inject theme — one call each, all idempotent: python -c "from pathlib import Path; from publish_html import \ write_shared_landing, write_edition_index, prettify_tree, \ scrub_nondeterministic_metadata, inject_gramframe_plugin, \ inject_operator_console_theme, EDITIONS, _ditamap_title, _generated_timestamp; \ out=Path('html'); ts=_generated_timestamp(); \ maps=sorted(Path('.dita-build').glob('*/*.ditamap')); \ [write_edition_index(out/e.output_subdir, e, \ [(_ditamap_title(m,e), m.stem) for m in maps], ts) for e in EDITIONS]; \ write_shared_landing(out, EDITIONS, ts); \ prettify_tree(out); scrub_nondeterministic_metadata(out); \ inject_gramframe_plugin(out); inject_operator_console_theme(out)" # 7. Verify the student edition has no instructor leakage grep -ri instructor html/student/ && echo "LEAK" || echo "clean" find html/student -iname '*instructor*' | head -1 && echo "LEAK" || echo "clean" # 8. Verify URL parity diff <(cd html/instructor && find . -type f | sort) \ <(cd html/student && find . -type f | sort)
Easier option: python publish_html.py --dita dita --out html --dita-ot $DITA_OT does all of the above in one invocation. The breakdown above exists so you can re-run any single step in isolation if a later step fails.
theme.css and gramframe.bundle.js.*.html has a <link rel="stylesheet" …/theme.css> in <head>.*.html has a <script src="…/gramframe.bundle.js" defer> in <head>.grep -ri instructor …/student/ returns nothing.find …/student -iname '*instructor*' returns nothing.These are the only checks the author needs after each Oxygen publish.
html/index.html exists, links to both editions, carries body.landing.html/instructor/ and html/student/ have the same set of file paths (URL parity).index.html has class="edition-index"; every per-publication index.html has class="ditamap-index" (belt-and-braces, theme no longer needs these).<body> has data-edition="instructor" or "student" (same caveat).SOURCE_DATE_EPOCH).<meta name="DC.date.*"> tags anywhere under html/.The test suite (python -m unittest discover tests/) checks every one of these — run it before each handover.
audience="-trainee" sites + a four-line DITAVAL = two editions, no forking.outputclass → HTML class verbatim — the theme's only contract for content styling.:has(ul.map) (index) and :has(.ph) (instructor). No injector, no plugin, no JS.theme.css + gramframe.bundle.js; author picks DITAVAL in Oxygen's UI. Done.publish_html.py is a dev convenience — CI runs it for determinism + the test suite. The target never sees it.Questions? Open theme.css alongside this deck — every :has() selector maps to one of these slides.