docs: Improve Spark version compatibility & ANSI mode documentation [WIP]#4079
Draft
andygrove wants to merge 7 commits intoapache:mainfrom
Draft
docs: Improve Spark version compatibility & ANSI mode documentation [WIP]#4079andygrove wants to merge 7 commits intoapache:mainfrom
andygrove wants to merge 7 commits intoapache:mainfrom
Conversation
…ording Add a dedicated Spark Version Compatibility page that documents the known Spark 4.0 gaps (VariantType, Parquet type widening) and notes that ANSI mode has good coverage with fallback for unsupported cases. Wire the page into the compatibility toctree. Move Spark 4.0.1 into the main supported-versions table in the installation guide and replace the "experimental, not for production" paragraph with a positive note that links to the new compatibility page. Drop "(Experimental)" from the Spark 4.0 JAR link.
Move the Spark 3.4/3.5 section before Spark 4.0 in the new compatibility page. Correct the description of Spark 4.0 gaps: variant columns fall back to Spark, but unsupported Parquet type widening is not detected as a fallback condition and may return incorrect results.
Issue apache#313 is closed, but Sum/Average aggregates and some Cast expressions still fall back to Spark in ANSI mode. Update the Spark Version Compatibility page to name those cases and drop the epic link from cast.md.
Sum and Average pass eval_mode through to the native accumulators (sum_int, sum_decimal, avg_decimal), which handle ANSI overflow themselves. The remaining ANSI-mode fallbacks are in Cast.
Add notes for two additional Spark 4.0 fallback paths: non-default string collations (group-by, distinct, sort, join, shuffle) and DataSource V2 bucketing with partially clustered distribution.
The native_datafusion scan silently accepts schema mismatches (issue apache#3720) on all supported Spark versions, not just Spark 4.0. Document the behavior under native_datafusion limitations in scans.md and cross-reference from the Spark 4.0 type widening section.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of #1637
Rationale for this change
There has been a lot of progress recently with Spark 4 and ANSI support. The docs now need some updates.
What changes are included in this PR?
How are these changes tested?
N/A