SME review walkthrough · pptx-legacy-transform
Linear walkthrough · ~20 min · questions welcome at any point
.glcA small, self-contained training artifact:
.docx) of measured values the analyst fills in from their GramFrame readings (bearing, base frequency, harmonics, shaft / blade rate, classification, ...), or a PNG screenshot of that same sheet.glc configuration file. The .glc in turn references the spectrogram asset GAPS-Lite would render: usually a .png/.jpg pre-rendered image (~82%), occasionally a .wav raw recording rendered live by the on-PC viewer (~18%).Audited across 1,004 Lofar text-run hyperlinks: every one targets a .glc. The .wav case is always one indirection deeper, inside the GLC's data_source/filename.
A single gram tile, as seen on a legacy slide
Reconstructed from Instructor Progress Test 1 Grams.pptx, slide 2.
15 gram tiles per slide. Each title and each Lofar label is a hyperlink to a file on disk.
Excerpt from introspect_pptx.py — structural report of the real file
=== Section 1: Summary ===
Filename: Instructor Progress Test 1 Grams.pptx
Total slides: 4
Hyperlink target extensions:
.docx: 19 ← analysis sheets
.glc: 64 ← LOFAR configurations
.png: 11 ← inline analysis images
Shape-level hyperlinks: 30 ← title boxes
Text-run hyperlinks: 64 ← "Lofar 1", "Lofar 2"...
-- Slide 2 (shapes: 31) --
Rounded Rectangle 2 pos=(0.40,0.80) text='Gram 1: FR Prometheus, ...'
shape_hyperlink=...Files/Gram 1/Analysis Sheet.docx
TextBox 3 pos=(0.40,1.22) text='Lofar 1 Lofar 2'
run[0]: hyperlink=...Files/Gram 1/Lofar 1.glc
run[1]: hyperlink=...Files/Gram 1/Lofar 2 I.glc
Two different hyperlink mechanisms per gram — shape-level for the title, run-level for individual labels. Both must be extracted faithfully.
Need to find a particular gram? You'd have to know which deck it's in before you start.
Each of the ~10 publications exists twice over — instructor and student — not just the PPTX, but every shared asset it points at: spectrogram images, sound files, and .glc configs. Authors keep both copies in sync by hand; any edit risks drift across hundreds of files.
Every tile points at a file in the folder next door. Move the folder, rename it — links break and nothing tells you.
Pub-9 and pub-10 already go through Oxygen. The grams come out of PowerPoint — different format, different workflow.
I didn't survey every deck. I pulled a random sample of content from across them, enough to see the patterns repeat. Off that sample I built an unclassified stand-in: fake vessel names, real structure. Safe to take off-site, faithful enough to work against — and that's what the script was written against.
Honest about the catch: real decks will show variance the sample didn't cover. Handling that variance is the work ahead.
extract_to_csv.pygenerate_dita.pypublish_html.py / OxygenFour production stages, plus introspect_pptx.py as a standalone diagnostic tool (it produced the report on the previous slide). Small Python scripts, one third-party dependency — designed to be debuggable on an air-gapped network.
This whole toolchain exists to produce a single, complete conversion from PowerPoint to DITA. Once that conversion is signed off, the toolchain has done its job — it isn't infrastructure to host, maintain, or learn. From there on, the grams are maintained in DITA, alongside the rest of the publication set, by the author.
One row per Lofar (plus one per analysis sheet). Every row of a single gram shares the same topic_filename — the generator merges them downstream.
| publication | gram_id | vessel_name | topic_type | seq | topic_filename | time_end | freq_end |
|---|---|---|---|---|---|---|---|
| progress-test-1 | Gram 18 | FR Razor Crest, Category 2, Gandalf | glc | 1 | gram_18.dita | 180 | 200 |
| progress-test-1 | Gram 18 | FR Razor Crest, Category 2, Gandalf | glc | 2 | gram_18.dita | 271 | 100 |
| progress-test-1 | Gram 18 | FR Razor Crest, Category 2, Gandalf | glc | 3 | gram_18.dita | 300 | 800 |
| progress-test-1 | Gram 18 | FR Razor Crest, Category 2, Gandalf | glc | 4 | gram_18.dita | 360 | 200 |
| progress-test-1 | Gram 18 | FR Razor Crest, Category 2, Gandalf | analysis | 1 | gram_18.dita |
Gram 18 has four Lofars and one analysis sheet — five rows, one topic_filename. Today's CSV: 1,409 rows of this shape across 7 publications. The author edits this in Excel; the warnings column is where the next stage takes over.
The technical author opens the CSV in Excel and:
time_end / freq_end values for plausibilityCSV is deliberately chosen because Excel is universal, diff-able under version control, and survives review-edit-review cycles without proprietary tooling. README documents the Excel save-as risks (BOM stripped, line endings flipped, leading zeros coerced) and a CSV round-trip test now guards the byte-level invariant.
Generation is deterministic from the CSV. Nothing gets into the published output that a human hasn't approved.
If something looks wrong in the published HTML, the fix is to correct the CSV and re-run — not to patch the output.
The N+1 CSV rows for a gram collapse into one DITA topic per gram: an Analysis Sheet section (instructor-only) followed by one section per Lofar, in CSV sequence order.
<topic id="gram_18">
<title>Gram 18<ph audience="-trainee"> - FR Razor Crest, Category 2, Gandalf</ph></title>
<body>
<section audience="-trainee" outputclass="analysis-sheet">
<title>Analysis Sheet</title>
<image href="analysis.png" placement="break" align="center" />
</section>
<section outputclass="lofar-stage">
<title>Lofar 1</title> <!-- ← section heading lifted from the PPTX link label -->
<table outputclass="gram-config">...time-end=180, freq-end=200, image=lofar-1.png...</table>
</section>
<section outputclass="lofar-stage">
<title>Lofar 2</title>
<table outputclass="gram-config">...time-end=271, freq-end=100, image=lofar-2.png...</table>
</section>
<!-- Lofar 3, Lofar 4 sections follow, one per CSV glc row -->
</body>
</topic>
audience="-trainee" drives the dual-edition split — on the vessel-name decoration and on the whole Analysis Sheet section. Section titles ("Lofar 1", "Stage 1 — Broadband", ...) come straight from the PPTX link labels.
One DITA source tree, two DITA-OT passes:
--filter=dita/trainee.ditaval, excluding every element tagged audience="-trainee".A shared landing page (html/index.html) lets the reader pick. URL paths below the edition segment are identical — swapping instructor/ ↔ student/ reaches the same gram in the other edition.
A Jest test sweep over html/student/ asserts zero case-insensitive occurrences of the string "instructor" in any rendered text or URL — the edition split is enforced by build, not by review.
html/
├── index.html ← shared landing
├── instructor/
│ ├── index.html
│ ├── main/
│ ├── progress-final-assessment/
│ ├── progress-test-1/ ... progress-test-5/
└── student/
├── index.html
├── main/ ← --filter=trainee.ditaval
├── progress-final-assessment/
├── progress-test-1/ ... progress-test-5/
Instructor — every gram exposes vessel name, category, codename. The Analysis Sheet section is present on every topic.
Student — same tree, same URLs, but vessel names are gone and Analysis Sheet sections are filtered out of every topic.
The default DITA-OT output is functional but plain. The themed output reads as a piece of equipment:
data-edition attribute on <body> — one CSS file, no per-edition forkThe static gram-config tables in the DITA source are replaced at runtime by the GramFrame plugin (next slide) — what you see at right isn't just styling.
Gram 20, instructor edition — real screenshot of html/instructor/main/week-4-grams-updated/gram-20/.
Lofar → Windows launches the .glc in GAPS-Lite: separate non-intuitive app, dedicated training, trainee out of lesson contextThe ~18% WAV case still drops out to GAPS-Lite via the .glc link. Closing that gap is the next phase — video under GramFrame, enabling aural analysis without leaving the browser.
Live GramFrame instruments rendered inline inside one HTML page.
Gram 18 — the five CSV rows from slide 8 merged into one topic: Analysis Sheet, then four live GramFrame instruments.
Same URL path, just student/ instead of instructor/. Vessel name and Analysis Sheet gone, banner flips to cyan — one source, filtered at publish.
Shared landing page → per-edition index → per-publication index → gram topic. This is the actual generator output, served straight from the repo.
One publication set, ~1,000 grams, all cross-referenced. Find a gram by vessel, category or codename without knowing which deck it lives in.
Instructor edition shows the vessel name; trainee edition doesn't. Same source, can't drift out of sync.
Every row went through the signed-off review sheet, and an automated check makes sure nothing instructor-only leaks into the trainee edition.
Operator Console v2 styling dresses the output as a piece of training kit. The banner colour flips between editions; everything else is shared.
Where each fix lands: the data itself (one-off correction), the stand-in (pattern captured for next time), or the script (so it copes from then on).