← Home

Is Frictionless Data a fit?

Findings from building the Frictionless Data Explorer

A research artefact — what we found in 2026, against frictionless 5.19.0 on Pyodide 0.27.7

The question we set out to answer

Does the Frictionless ecosystem do what we expect — and where does it creak?

  • This is a research artefact, not a product. Its job is to evaluate Frictionless as one validation "leg" among several.
  • Stance: confirmatory with adversarial probes — we expected it to fit, and went looking for the sharp edges anyway.
  • Method: build-it-to-learn-it. Forcing a working IDE and a full curriculum into existence is the research.

What we built to find out

  • An in-browser IDE — Monaco editor, a virtual filesystem, and a mini-shell wired to Pyodide, so the real frictionless CLI runs entirely in the browser. No servers, no accounts.
  • Eight lessons covering the full arc: describe → schema → validate → package → dialect → transform → inquiry → publish.
  • Each lesson carries a Notes & Observations section, filled in at the moment of contact — what worked, what surprised, what needed a workaround.

Try it: the IDE lives at /playground/ alongside these slides.

Headline verdict

Frictionless fits — with well-articulated caveats.

  • The core describe → validate → schema-author loop is boring-good — the healthy 80% of the tool.
  • The caveats are real and worth knowing before you depend on them — most are about defaults and version-specific surfaces, not fundamentals.
  • One question is deliberately deferred: does it fit our domain (maritime acoustics)? That is v1.1.

What worked well the quiet-good

  • Type inference is fast and reasonable; the validate report is specific — row, field, constraint, and a human-readable message.
  • Foreign-key checks name the lookup table and the unresolvable value — more specific than typical database "FK violation" messages.
  • Dialect sniffing: BOMs are handled invisibly (utf-8-sig); semicolon and tab delimiters auto-detect with no hint.
  • Remote consumption "just works": describe/validate on a package URL follow relative paths and resolve foreign keys across CSVs at the same prefix. File/URL polymorphism is invisible.

The #1 footgun know this

frictionless validate file.csv with no schema returns VALID — even on garbage semantic data.

  • Without --schema it runs only structural checks: column counts, "does it parse as CSV".
  • Semantic checks — types, required, unique, foreign keys — all need a schema.
  • In production: always pair validate with an explicit schema. The default mode is a parser check, not a correctness check.

Promoted to a top-level section in lesson 3.

Type inference has cultural priors


price_eur
9,50      →  inferred as  geopoint   (!)
          
  • A semicolon-delimited European CSV with prices like 9,50 infers price_eur as geopoint — the comma reads as a coordinate separator.
  • The fix is a one-line schema override: { "type": "number", "decimalChar": "," }.
  • Lesson 5 keeps this as a teaching moment: inference is good, but its priors about "what a number looks like" are culture-specific.

Spec surfaces aren't uniform foot-stubs

  • primaryKey string vs array. "primaryKey": "id" is accepted in a package schema but rejected inside an inquiry-embedded schema (must be ["id"]).
  • Schema-by-path. "schema": "schema.json" works in a resource descriptor but fails inside an inquiry task — embed the schema inline there.
  • Two errors, one cause. A duplicate id emits both unique-error and primary-key — one fix clears both, but a literal reading over-counts.
  • Row numbering counts the header: a problem on the third data row reports as "row at position 4".

Transform is the weak spot version-fragile

  • No CLI transform in v5.19. The most-mentioned Frictionless verb is the only one without a CLI — you drop into the Python transform() function.
  • row-filter formulas operate on raw strings. published_year >= 1970 raises TypeError; the working form is int(published_year) >= 1970.
  • Inconsistent step naming. field-remove takes names (plural); siblings take name. There is no field-rename — you use field-update with descriptor.name.

Most version-fragile lesson in the curriculum — pin Frictionless and re-walk lesson 6 on any upgrade.

Platform constraints (Pyodide + Pages)

  • No SharedArrayBuffer. GitHub Pages can't serve COOP/COEP headers, so no cross-origin isolation, no threading.
  • Cold-start latency. Pyodide + install is ~1–3.7 s cold; Firefox is ~3–5× slower than Chromium. Pyodide runs on a Web Worker to keep the UI responsive.
  • Absolute paths rejected. Frictionless calls /file.csv "not safe" — the worker chdirs into the workspace and passes relative paths.
  • Enterprise DLP can strip WebAssembly from Workers (managed Chrome). No code-side fix; the app detects it and recommends Edge / Firefox.

These are delivery constraints of running in-browser — not Frictionless limitations.

Per-lesson scorecard

# Lesson Verdict Headline finding
1DescribeClean Cleanest surface; integer not year by default
2SchemaClean Hand-readable; primaryKey ≠ field unique
3ValidateCaveat No-schema validate returns VALID on bad data
4PackageStrong Foreign-key reports are excellent; paths are cwd-relative
5DialectCaveat BOM/delimiter sniffing good; 9,50 → geopoint
6TransformFragile No CLI; raw-string formulas; inconsistent steps
7InquiryCaveat Stricter than packages; embed schemas inline
8PublishStrong Remote consumption "just works"; tutorial URLs often dead

The open question v1.1

Does Frictionless fit our domain — maritime acoustics?

v1 deliberately used stub CSVs. The fitness question owns v1.1:

  • dB reference levels and units
  • per-row spectra arrays
  • hierarchical campaign / run / pass structures
  • controlled vocabularies aligned to ISO standards

v1 built the foundation; v1.1 turns it into a domain-fitness assessment. This is the most important remaining question.

Recommendation

Adopt Frictionless for tabular description, schema and validation — with eyes open.

  • Lean on describe, schema authoring, validate-with-schema, packages, and remote consumption.
  • Handle with care the no-schema-validate default, inquiry strictness, and dialect inference priors.
  • Pin and re-test anything touching transform across versions.
  • Still to prove: domain fitness for maritime acoustic data (v1.1).

Evidence: eight lessons' Notes & Observations and docs/limitations.md.

Explore the evidence

  • The IDE — run frictionless in your browser
  • The eight lessons — each with its own Notes & Observations
  • README.md Findings & docs/limitations.md — the full catalogue

"This is what we found in 2026." — a dated, reproducible reference build.