feat: generate expression reference doc from code [WIP]#4585
Draft
andygrove wants to merge 18 commits into
Draft
feat: generate expression reference doc from code [WIP]#4585andygrove wants to merge 18 commits into
andygrove wants to merge 18 commits into
Conversation
prettier re-aligns markdown table columns to the widest cell, so adding a single expression row rewrites every row in the table. That produces noisy diffs and frequent merge conflicts between PRs that each add new expressions. Exempt the file from prettier so future additions stay as one-line diffs.
With prettier no longer aligning the tables, collapse the existing column padding so that adding an expression row never shifts the other rows. Combined with the prettier exemption, every future addition is a true one-line diff that cannot collide on re-alignment.
The per-group tables are generated by GenerateDocs at site-publish time and frozen into release branches, matching how configs.md and the compatibility guide are handled. The main branch keeps only the markers and prose so the generated content never goes stale in source. [skip ci]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
N/A. Follow-on to #4583. This reduces drift and maintenance friction in the expression reference doc by generating it from code.
Rationale for this change
docs/source/user-guide/latest/expressions.mdwas hand-maintained: every PR that added or changed an expression edited the tables by hand. That let the doc drift from reality (a function supported in code but still listed as planned, or a new Spark built-in never added) and made large aligned tables conflict-prone.The Compatibility Guide is already generated by
GenerateDocsfrom each serde'sgetCompatibleNotes/getIncompatibleReasons/getUnsupportedReasons. This PR extends the same generator to also produce the expression reference, so the overview is derived from the code that actually decides support, and stays complete and current.What changes are included in this PR?
org.apache.comet.ExpressionReference: status model, row resolution, table rendering, and SparkFunctionRegistryenumeration (unit-tested in isolation).GenerateDocsextended to: enumerate every Spark built-in (with its group), derive Supported status and a Compatibility Guide link from the serde maps, and fall back to a curated status list for planned / not-planned functions. The curated list lives inGenerateDocs.scalaon purpose: that file is excluded from the heavy CI path filters indev/ci/compute-changes.py, so editing the list (for example when an issue is filed) does not trigger the Spark SQL and Iceberg jobs.expressions.mdper-group tables are now generated between<!--BEGIN:EXPR_TABLE[group]-->markers; the prose was updated to drop the "Incorrect by default" status.FunctionRegistry) indev/generate-release-docs.shanddocs/build.sh.Known follow-ups (not in this PR): populate per-expression summary notes via a new
getExpressionSummary(currentlyNone, so serde-backed rows have sparse notes); add a CI check that fails when the generated doc is stale; rename the curatedPlannedExprtype now that it also holds Supported entries.How are these changes tested?
ExpressionReferenceSuitecovers the status model, every branch of row resolution (serde + link, serde without page, planned + issue, not-planned, unclassified), and rendering.FunctionRegistryEnumerationSuiteverifies enumeration against real Spark built-ins.