HTML+JSON-LD reader#312
Merged
Merged
Conversation
HtmlJsonLDReader replaces the old JsonLDReader: a thin Jsoup-based wrapper that finds every <script type="application/ld+json"> in the response and delegates each payload to Jena's built-in JSONLD11 reader (Titanium-backed), so embedded schema.org markup parses as RDF without a heavyweight jsonld-java dependency. SchemaOrgDocumentLoader serves the bundled schema.org @context offline. ProxyRequestFilter now dispatches upstream responses on Jena's live RIOT registry (RDFLanguages.contentTypeToLang + ResultSetReaderRegistry), not on MediaTypes.getReadable(...). The MediaTypes lists are a static-initializer snapshot captured the first time client.MediaTypes loads — before Application's constructor body registers the late langs (HTML, RDFPOST) — so those are permanently absent from that source. They are present in the RIOT registry by request time, which is what ModelProvider.isReadable also queries directly. Using the same predicate makes the proxy route every upstream Jersey can actually deserialize through the typed Model/ResultSet builders instead of piping bytes verbatim. This reverts the InputStream byte-pipe from 56f7730 for RDF and SPARQL results upstreams while keeping it for non-RDF media types. The 500 the pipe avoided (pre-selecting a variant from a combined Model+ResultSet list before the body was read, then writing a Model to a sparql-results variant) is structurally unreachable now: the entity class is known before selectVariant runs, so each typed branch picks from a single-class writable list. Tests: HtmlJsonLDReaderTest verifies extraction (single/multi script, missing script, non-ld+json scripts ignored, isomorphism with direct JSONLD11 parse). POST-html-jsonld.sh covers ingesting HTML+JSON-LD through the document hierarchy. GET-proxied-html-jsonld.sh covers fetching schema.org/WebSite through the proxy with Accept: text/turtle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Match the terse failure style of the sibling proxy tests; the assertion logic is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
namedgraph
added a commit
that referenced
this pull request
Jun 7, 2026
Cover the HTML+JSON-LD reader (#312) shipped in 5.5.0 — entry was omitted at release time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.