DITA → HTML

How we process the DITA tree, and how the Operator Console v2 theme paints it

The publish pipeline — two lanes, one source tree

One DITA source under dita/ → two HTML editions under html/. What the air-gapped author actually runs vs. what the dev pipeline adds on top:

Air-gapped target (Oxygen XML Author)
A. DITA-OT
via Oxygen's
publish dialog
B. DITAVAL filter
picked in Oxygen's UI
(student edition only)
C. Template hooks
link theme.css +
gramframe.bundle.js

That's the whole air-gapped recipe. The CSS classifies pages itself via :has(); no body tagging, no landing page, no prettify pass.

Dev side / CI (publish_html.py)
1. stage
.dita-build/
DOCTYPEs +
ditamap nesting
2. publish
DITA-OT × 2
(both editions)
3. landing/index
html/index.html
+ per-edition
4. prettify
re-emit every
.html for diffs
5. scrub
strip DITA-OT
wall-clock <meta>
6. GramFrame
vendor bundle +
inject <script>
7. theme
vendor theme.css
+ inject <link>
  • Solid boxes = essential (their effect must appear in the air-gapped output, by whatever route). Faded boxes = dev convenience for CI: deterministic, diffable, indexed output. None of them changes what a reader sees inside a publication.
  • Step 2 maps onto lanes A+B; steps 6 and 7 map onto lane C (the Oxygen template's job).

What the DITA source looks like

One gram topic. The annotations in red are the hooks every later step keys on.

<topic id="gram_01">
  <title>Gram 01<ph audience="-trainee" outputclass="vessel-name"> - FR Outrider, Category 4, Tantive</ph></title>
  <body>
    <section audience="-trainee" outputclass="analysis-sheet">
      <title>Analysis Sheet</title>
      <p><xref href="analysis-sheet.docx" format="docx" scope="local">Analysis Sheet</xref></p>
    </section>
    <section outputclass="lofar-stage">
      <title>Lofar 1</title>
      <table outputclass="gram-config">
        <tgroup cols="2">
          <colspec colname="c1" colnum="1"/><colspec colname="c2" colnum="2"/>
          <tbody>
            <row><entry namest="c1" nameend="c2"><image href="lofar-1-i.png" placement="break" align="center"/></entry></row>
            <row><entry>time-start</entry><entry>0</entry></row>
            <row><entry>time-end</entry><entry>300</entry></row>
            ...
          </tbody>
        </tgroup>
      </table>
    </section>
  </body>
</topic>
  • audience="-trainee" — consumed by the DITAVAL filter; stripped from the student edition.
  • outputclass="..." — DITA-OT copies this into HTML as class="..."; that is the only hook the theme needs.

The four audience-tagged sites

These are the only places we attach audience="-trainee". Together they carry everything the student must not see.

#SiteDITA shapeWhere it lives
1 Vessel-name decoration on a gram title <title>Gram NN<ph audience="-trainee"> — vessel</ph></title> every gram_NN.dita
2 Analysis Sheet section <section audience="-trainee"><title>Analysis Sheet</title>…</section> every gram_NN.dita
3 "Instructor " prefix on a chapter navtitle <navtitle><ph audience="-trainee">Instructor </ph>Week 1 Grams</navtitle> main.ditamap only (fires when CSV chapter starts "Instructor ")
4 "— Instructor Version" suffix on a map title <title>Progress Test 1<ph audience="-trainee"> — Instructor Version</ph></title> every ditamap

Folder and file names never contain the substring "instructor" (case-insensitive). The audience prefix is split off before the slug is computed (_normalise_chapter() in generate_dita.py) so the URL path is identical in both editions — only the visible label differs.

The trainee DITAVAL — four lines that build the student edition

File: dita/trainee.ditaval. Committed alongside the DITA source. publish_html.py refuses to build without it.

<?xml version="1.0" encoding="UTF-8"?>
<val>
  <prop att="audience" val="trainee" action="exclude"/>
</val>

DITA-OT tokenises the audience attribute: "-trainee" contains the token trainee, so action="exclude" matches and the element disappears from the build.

What it removes, per site:

SiteInstructor rendersStudent renders
Gram titleGram 01 — FR OutriderGram 01
Analysis Sheetfull section(gone)
Chapter navtitleInstructor Week 1 GramsWeek 1 Grams
Map titleProgress Test 1 — Instructor VersionProgress Test 1

Leakage guarantee: a recursive grep for instructor (any case) over html/student/ — content and paths — must return zero matches.

2 Two DITA-OT invocations — same source, one CLI flag apart

Instructor edition

dita \
  --input=.dita-build/main/main.ditamap \
  --format=html5 \
  --output=html/instructor/main \
  --processing-mode=lax

No --filter — the full content is rendered.

Student edition

dita \
  --input=.dita-build/main/main.ditamap \
  --format=html5 \
  --output=html/student/main \
  --processing-mode=lax \
  --filter=.dita-build/trainee.ditaval

Only the --filter and --output change.

For each ditamap (1 main + 5 progress tests + 1 final assessment = 7), this loop runs 14 times total. The output trees mirror each other one-for-one (URL parity, FR-016).

1 Why the stage() step exists

DITA-OT can't read the source tree directly. stage() takes a build-only copy of dita/ into .dita-build/ and fixes two things:

  • DOCTYPE injection. Source .dita / .ditamap files are committed without DOCTYPEs (so Oxygen authors don't trip on DTD lookups). DITA-OT needs them to classify elements, so stage() prepends them to every staged copy.
  • Ditamap nesting. A map at dita/progress-test-5.ditamap referencing progress-test-5/gram-01/... would publish to .../progress-test-5/progress-test-5/gram-01/... — a duplicated segment. stage() moves the map into its own folder and rewrites the href=s to drop the leading <stem>/, so DITA-OT writes the clean tree we want.
# Before staging
dita/
├── main.ditamap
├── main/
│   └── pub10-ed22b-updated/gram-01/gram_01.dita
├── progress-test-1.ditamap
└── trainee.ditaval

# After staging
.dita-build/
├── main/main.ditamap                  # DOCTYPE added, hrefs rewritten
├── main/pub10-ed22b-updated/gram-01/gram_01.dita   # DOCTYPE added
├── progress-test-1/progress-test-1.ditamap
└── trainee.ditaval

Source tree is never touched. Re-staging is idempotent (the directory is wiped first).

45 Prettify and scrub — making the HTML diffable

Prettify (prettify_tree)

DITA-OT emits each topic page on a single long line. Unreadable in view-source, unreadable in a diff. The custom HTML tree-builder re-emits every *.html with:

  • Block elements on their own indented line.
  • Inline subtrees (<a>, <span>, <strong>) left flat — splitting them would change rendered whitespace.
  • <pre> / <script> / <style> preserved verbatim.
  • Void elements (<meta>, <img>) emitted HTML5-style, no trailing slash.

Side benefit: the canonical layout means the regexes in steps 6 and 7 always find </head> at exactly the same indent.

Scrub (scrub_nondeterministic_metadata)

DITA-OT bakes wall-clock timestamps into every page. Two carriers:

<meta name="DC.date.created"  content="2026-05-17T09:14:02Z"/>
<meta name="DC.date.modified" content="2026-05-17T09:14:02Z"/>

Both depend on the run wall-clock, not on the source. A single regex strips both from every page. Result: byte-identical output across runs (FR-008 / SC-006), so we can hash-compare two trees and trust the result.

6 Inject GramFrame — spectrograms become interactive

DITA-OT renders every <table outputclass="gram-config"> into HTML as <table class="gram-config">. The vendored GramFrame bundle (vendor/gramframe/gramframe.bundle.js) scans for that class on DOMContentLoaded and rewrites the table into an interactive spectrogram viewer.

All inject_gramframe_plugin has to do is:

  1. Copy the bundle to html/gramframe.bundle.js.
  2. For every *.html under html/, insert one line before </head>:
    <script src="../../../../gramframe.bundle.js" defer></script>
    (the relative path is computed per file so deep pages still resolve it)
  3. Skip any file that already contains the marker string — idempotent.

The script is a no-op on pages with no gram-config table, so it's safe to drop on every page in both editions.

7 Inject the theme — CSS detects everything itself

The theme step has one essential job: vendor theme.css + a <link> tag into every page. The stylesheet then classifies each page by what DITA-OT already emitted — no Python-set body attributes required.

VariationCSS selectorWhy it matches
Ditamap index page body:has(ul.map) ul.map only exists on the per-ditamap index DITA-OT emits
Instructor edition body:has(.ph) Every <ph> in the source is audience="-trainee" (chapter prefix, map-title suffix, vessel-name). DITAVAL strips them all from student.
Student edition body:not(:has(.ph)) Inverse of the instructor detector
Instructor index body:has(ul.map):has(.ph) Compound — both detectors fire
Student index body:has(ul.map):not(:has(.ph)) Compound — index but no .ph

A copy of theme.css is placed at the root and inside each edition folder so every page has a nearby copy to link with a short relative href. The link is only inserted if the page doesn't already have one (idempotent).

Why this matters: the air-gapped target won't run our Python. Oxygen's publish template injects the theme.css link, the audience filter does its work, and the CSS classifies and styles each page without any post-publish step. The body attributes publish_html.py still writes are belt-and-braces — useful for dev-side inspection, irrelevant to the styling.

The CSS architecture — one file, switched by DOM structure

vendor/themes/operator-console-v2/theme.css (~570 lines). Every variation is driven by combinations of:

  • body:has(.ph) / body:not(:has(.ph)) — classification banner, accent colour, tile density (edition detection).
  • body:has(ul.map) — ditamap-index page (page-type detection).
  • section.lofar-stage / section.analysis-sheet / table.gram-config / .vessel-name — carried straight through from DITA's outputclass.
/* Variant by edition: the classification banner.
   .ph elements only exist on instructor pages (every  in the source is
   audience-tagged and DITAVAL strips them all from student). */
body::before {
  content: "┄ INSTRUCTOR ┄ TRAINING USE ONLY ┄ CLASS-RESTRICTED ┄";
  color: var(--instructor);
  ...
}
body:not(:has(.ph))::before {
  content: "┄ STUDENT ┄ TRAINING ┄";
  color: var(--student);
  ...
}

/* Variant by outputclass: LOFAR stage gets the "trace" treatment.
   DITA-OT copies outputclass="lofar-stage" straight through to HTML class. */
section.lofar-stage { background: var(--panel); border: 1px solid var(--rule); ... }
section.lofar-stage > h2.sectiontitle::before { content: "● TRACE / STAGE"; ... }
section.lofar-stage .imagecenter::after { /* CRT scanlines overlay */ ... }

A short :root palette (--bg, --panel, --cyan, --amber, --instructor, --student…) is the only thing to edit if the colour scheme needs to shift. :has() is Baseline 2023 — current Chromium/Firefox/Safari support it.

Theming the DITA-OT index pages

DITA-OT emits an index page per ditamap with a deeply nested <ul class="map">. By default it is a long single-column list — useless for a 480-gram publication.

body:has(ul.map) reshapes it into a card-and-tile grid with CSS alone:

/* Section card — one per <li class="topichead"> (= chapter heading) */
body:has(ul.map) li.topichead {
  background: var(--panel);
  border: 1px solid var(--rule);
  border-left: 3px solid var(--cyan);
  border-radius: 4px;
  padding: 16px 18px 18px;
}
/* Colour-code the five chapters by position */
body:has(ul.map) li.topichead:nth-child(1) { border-left-color: var(--amber); color: var(--amber); }
body:has(ul.map) li.topichead:nth-child(2) { border-left-color: #4dd0e1; ... }
body:has(ul.map) li.topichead:nth-child(3) { border-left-color: #66bb6a; ... }
...

/* Inner gram list — dense responsive tile grid */
body:has(ul.map) li.topichead > ul {
  display: grid;
  grid-template-columns: repeat(auto-fill, minmax(220px, 1fr));
  gap: 6px;
}

/* Flat ditamaps (progress tests) have no chapter cards. A nested :has()
   detects that shape and switches the outer list itself into a tile grid. */
body:has(ul.map) ul.map:has(> li.topicref:not(.topichead)) {
  display: grid;
  grid-template-columns: repeat(auto-fill, minmax(220px, 1fr));
  gap: 6px;
}

/* Student edition: short "Gram NN" labels -- tighten the grid further.
   Compound :has() combines page-type and edition detection in one selector. */
body:has(ul.map):not(:has(.ph)) li.topichead > ul,
body:has(ul.map):not(:has(.ph)) ul.map:has(> li.topicref:not(.topichead)) {
  grid-template-columns: repeat(auto-fill, minmax(120px, 1fr));
}

Same HTML, two looks — how the CSS branches

Instructor

  • Amber classification banner: INSTRUCTOR · TRAINING USE ONLY
  • Gram title shows vessel-name pill: ◉ TGT  FR Outrider, Category 4…
  • Analysis Sheet section visible with glyph
  • Ditamap-index tiles: 2 lines, wider (260px min), gram identifier + descriptor
  • Chapter navtitle: Instructor Week 1 Grams

Student

  • Cyan classification banner: STUDENT · TRAINING
  • Gram title is just Gram 01 — vessel-name pill absent (filtered out)
  • Analysis Sheet absent (whole <section> filtered out)
  • Ditamap-index tiles: centred, mono, denser (120px min) — works for short labels
  • Chapter navtitle: Week 1 Grams

The HTML emitted by DITA-OT is genuinely different between the two editions (the filter strips elements). The CSS branching on top of that only handles the chrome (banner colour, tile density) — the structural difference is already there from the audience filter, and the CSS reads it directly via :has(.ph).

The air-gapped handover — what stays, what we lose

The author on the air-gapped target has Oxygen XML Author and nothing else — no Python, no custom DITA-OT plugin. publish_html.py is a dev/CI convenience. The Oxygen template only has to copy theme.css + gramframe.bundle.js into the output and link them from each page (the pub-9 / pub-10 pattern). After that:

CapabilityOn the air-gapped target
Audience filter (instructor / student split)✓ Kept — Oxygen's DITAVAL UI
Per-publication standalone output✓ Kept — each ditamap publishes to its own folder
LOFAR / Analysis Sheet / vessel-name styling✓ Kept — DITA outputclass → HTML class
Edition banner (instructor amber / student cyan)✓ Kept — CSS :has(.ph) detects edition
Multi-column gram tile grid (chapter colour-coding)✓ Kept — CSS :has(ul.map) detects index pages
Per-edition tile density (wide instructor / dense student)✓ Kept — compound :has() selectors
Shared html/index.html landing page✗ Lost — hand-write one if needed
Per-edition "choose a publication" index✗ Lost — per-publication ditamap-index pages still exist (DITA-OT default)
Prettified HTML, scrubbed timestamps, byte-deterministic output✗ Lost — cosmetic only, no functional impact
Automated trainee-leakage verification✗ Lost — the guarantee still holds via DITAVAL; run a manual grep -ri instructor …/student/ after each publish

Net effect: the styling and content guarantees survive the move; what we lose is the curated cross-publication chrome (one landing, two edition indexes) and the dev-side determinism passes. Everything that matters to a reader inside one publication is intact.

Determinism — why this all has to be byte-stable

  • Air-gapped delivery. The output is shipped to the analyst PC on a USB. The maintainer needs to be able to re-run the pipeline, hash the output, and prove nothing changed.
  • Diff review. Source-controlled HTML (we don't, today, but consumers might) requires stable output — otherwise every publish run is noise.

The three things that would otherwise break byte-determinism, and what we do about each:

Source of driftMitigation
DITA-OT timestamp <meta> tagsscrub_nondeterministic_metadata() strips both carriers
Landing-page "Generated YYYY-MM-DD HH:MM UTC"Honours SOURCE_DATE_EPOCH — set it in CI to pin the timestamp
Single-line topic HTML (cosmetic, but defeats diff)prettify_tree() re-emits canonically

Result: publish_html.py run twice in a row on an unchanged source produces html/ trees that are byte-identical. A hash-of-tree test in tests/test_publish_html.py asserts this.

Running the dev pipeline by hand (when the orchestrator misbehaves)

On the air-gapped target the author uses Oxygen + the DITAVAL filter — nothing below. This recipe is for the dev side: everything in publish_html.py's main() reproduced manually with stdlib Python so any one step can be re-run in isolation if a later step fails:

# 0. Prereqs (all pre-vendored in the repo — no internet needed)
ls dita/                        # source tree, including trainee.ditaval
ls vendor/gramframe/            # gramframe.bundle.js
ls vendor/themes/operator-console-v2/  # theme.css
ls $DITA_OT/bin/dita          # DITA-OT 4.x with bundled JRE

# 1. Stage (DOCTYPE injection + ditamap nesting). Use the script —
#    the hand-rolled equivalent is fiddly enough to not be worth it.
python -c "from publish_html import stage; from pathlib import Path; \
           stage(Path('dita'), Path('.dita-build'))"

# 2. Two DITA-OT runs per ditamap. Loop in your shell:
for map in .dita-build/*/*.ditamap; do
  stem=$(basename "$map" .ditamap)
  $DITA_OT/bin/dita --input="$map" --format=html5 \
       --output="html/instructor/$stem" --processing-mode=lax
  $DITA_OT/bin/dita --input="$map" --format=html5 \
       --output="html/student/$stem"    --processing-mode=lax \
       --filter=.dita-build/trainee.ditaval
done

# 3. Landing + per-edition indexes, 4. prettify, 5. scrub,
#    6. inject GramFrame, 7. inject theme — one call each, all idempotent:
python -c "from pathlib import Path; from publish_html import \
  write_shared_landing, write_edition_index, prettify_tree, \
  scrub_nondeterministic_metadata, inject_gramframe_plugin, \
  inject_operator_console_theme, EDITIONS, _ditamap_title, _generated_timestamp; \
  out=Path('html'); ts=_generated_timestamp(); \
  maps=sorted(Path('.dita-build').glob('*/*.ditamap')); \
  [write_edition_index(out/e.output_subdir, e, \
       [(_ditamap_title(m,e), m.stem) for m in maps], ts) for e in EDITIONS]; \
  write_shared_landing(out, EDITIONS, ts); \
  prettify_tree(out); scrub_nondeterministic_metadata(out); \
  inject_gramframe_plugin(out); inject_operator_console_theme(out)"

# 7. Verify the student edition has no instructor leakage
grep -ri instructor html/student/ && echo "LEAK" || echo "clean"
find html/student -iname '*instructor*' | head -1 && echo "LEAK" || echo "clean"

# 8. Verify URL parity
diff <(cd html/instructor && find . -type f | sort) \
     <(cd html/student    && find . -type f | sort)

Easier option: python publish_html.py --dita dita --out html --dita-ot $DITA_OT does all of the above in one invocation. The breakdown above exists so you can re-run any single step in isolation if a later step fails.

Verification — what "good output" looks like

Air-gapped author's checklist

  • Each publication folder contains theme.css and gramframe.bundle.js.
  • Every *.html has a <link rel="stylesheet" …/theme.css> in <head>.
  • Every *.html has a <script src="…/gramframe.bundle.js" defer> in <head>.
  • Opening any gram page shows the dark theme, the right edition banner, and an interactive spectrogram.
  • Opening a ditamap index shows multi-column tiles (not a single long list).
  • grep -ri instructor …/student/ returns nothing.
  • find …/student -iname '*instructor*' returns nothing.

These are the only checks the author needs after each Oxygen publish.

Dev-side checklist (CI / pre-handover)

  • html/index.html exists, links to both editions, carries body.landing.
  • html/instructor/ and html/student/ have the same set of file paths (URL parity).
  • Every per-edition index.html has class="edition-index"; every per-publication index.html has class="ditamap-index" (belt-and-braces, theme no longer needs these).
  • Every topic <body> has data-edition="instructor" or "student" (same caveat).
  • Re-running the pipeline produces a tree whose file hashes are unchanged (modulo SOURCE_DATE_EPOCH).
  • No <meta name="DC.date.*"> tags anywhere under html/.

The test suite (python -m unittest discover tests/) checks every one of these — run it before each handover.

Recap

  • One source tree + four audience="-trainee" sites + a four-line DITAVAL = two editions, no forking.
  • DITA outputclass → HTML class verbatim — the theme's only contract for content styling.
  • One CSS file classifies pages itself via :has(ul.map) (index) and :has(.ph) (instructor). No injector, no plugin, no JS.
  • Air-gapped delivery: Oxygen template links theme.css + gramframe.bundle.js; author picks DITAVAL in Oxygen's UI. Done.
  • publish_html.py is a dev convenience — CI runs it for determinism + the test suite. The target never sees it.