Skip to content

HTML+JSON-LD reader#312

Merged
namedgraph merged 2 commits into
developfrom
rf-html-jsonld-reader
Jun 7, 2026
Merged

HTML+JSON-LD reader#312
namedgraph merged 2 commits into
developfrom
rf-html-jsonld-reader

Conversation

@namedgraph
Copy link
Copy Markdown
Member

No description provided.

HtmlJsonLDReader replaces the old JsonLDReader: a thin Jsoup-based wrapper
that finds every <script type="application/ld+json"> in the response and
delegates each payload to Jena's built-in JSONLD11 reader (Titanium-backed),
so embedded schema.org markup parses as RDF without a heavyweight
jsonld-java dependency. SchemaOrgDocumentLoader serves the bundled
schema.org @context offline.

ProxyRequestFilter now dispatches upstream responses on Jena's live RIOT
registry (RDFLanguages.contentTypeToLang + ResultSetReaderRegistry), not on
MediaTypes.getReadable(...). The MediaTypes lists are a static-initializer
snapshot captured the first time client.MediaTypes loads — before
Application's constructor body registers the late langs (HTML, RDFPOST) —
so those are permanently absent from that source. They are present in the
RIOT registry by request time, which is what ModelProvider.isReadable also
queries directly. Using the same predicate makes the proxy route every
upstream Jersey can actually deserialize through the typed Model/ResultSet
builders instead of piping bytes verbatim.

This reverts the InputStream byte-pipe from 56f7730 for RDF and SPARQL
results upstreams while keeping it for non-RDF media types. The 500 the
pipe avoided (pre-selecting a variant from a combined Model+ResultSet list
before the body was read, then writing a Model to a sparql-results variant)
is structurally unreachable now: the entity class is known before
selectVariant runs, so each typed branch picks from a single-class
writable list.

Tests: HtmlJsonLDReaderTest verifies extraction (single/multi script,
missing script, non-ld+json scripts ignored, isomorphism with direct
JSONLD11 parse). POST-html-jsonld.sh covers ingesting HTML+JSON-LD through
the document hierarchy. GET-proxied-html-jsonld.sh covers fetching
schema.org/WebSite through the proxy with Accept: text/turtle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@namedgraph namedgraph changed the base branch from master to develop June 7, 2026 21:22
Match the terse failure style of the sibling proxy tests; the assertion
logic is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@namedgraph namedgraph marked this pull request as ready for review June 7, 2026 21:24
@namedgraph namedgraph merged commit 8245442 into develop Jun 7, 2026
1 check failed
namedgraph added a commit that referenced this pull request Jun 7, 2026
Cover the HTML+JSON-LD reader (#312) shipped in 5.5.0 — entry was omitted at release time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant