Skip to content

docs: Improve Spark version compatibility & ANSI mode documentation [WIP]#4079

Draft
andygrove wants to merge 7 commits intoapache:mainfrom
andygrove:docs/spark4-compatibility
Draft

docs: Improve Spark version compatibility & ANSI mode documentation [WIP]#4079
andygrove wants to merge 7 commits intoapache:mainfrom
andygrove:docs/spark4-compatibility

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 25, 2026

Which issue does this PR close?

Part of #1637

Rationale for this change

There has been a lot of progress recently with Spark 4 and ANSI support. The docs now need some updates.

What changes are included in this PR?

  • Add new Spark Versions page to compatibility guide
  • Various minor updates

How are these changes tested?

N/A

…ording

Add a dedicated Spark Version Compatibility page that documents the known
Spark 4.0 gaps (VariantType, Parquet type widening) and notes that ANSI mode
has good coverage with fallback for unsupported cases. Wire the page into
the compatibility toctree.

Move Spark 4.0.1 into the main supported-versions table in the installation
guide and replace the "experimental, not for production" paragraph with a
positive note that links to the new compatibility page. Drop "(Experimental)"
from the Spark 4.0 JAR link.
@andygrove andygrove changed the title docs: add Spark version compatibility guide and soften experimental wording docs: add Spark version compatibility guide and soften experimental wording [WIP] Apr 25, 2026
Move the Spark 3.4/3.5 section before Spark 4.0 in the new compatibility
page. Correct the description of Spark 4.0 gaps: variant columns fall
back to Spark, but unsupported Parquet type widening is not detected as
a fallback condition and may return incorrect results.
Issue apache#313 is closed, but Sum/Average aggregates and some Cast expressions
still fall back to Spark in ANSI mode. Update the Spark Version
Compatibility page to name those cases and drop the epic link from cast.md.
@andygrove andygrove changed the title docs: add Spark version compatibility guide and soften experimental wording [WIP] docs: Improve Spark 4 & ANSI mode documentation [WIP] Apr 25, 2026
Sum and Average pass eval_mode through to the native accumulators
(sum_int, sum_decimal, avg_decimal), which handle ANSI overflow
themselves. The remaining ANSI-mode fallbacks are in Cast.
Add notes for two additional Spark 4.0 fallback paths: non-default
string collations (group-by, distinct, sort, join, shuffle) and
DataSource V2 bucketing with partially clustered distribution.
@andygrove andygrove marked this pull request as ready for review April 25, 2026 14:03
@andygrove andygrove marked this pull request as draft April 25, 2026 14:13
@andygrove andygrove changed the title docs: Improve Spark 4 & ANSI mode documentation [WIP] docs: Improve Spark version compatibility & ANSI mode documentation [WIP] Apr 25, 2026
The native_datafusion scan silently accepts schema mismatches (issue
apache#3720) on all supported Spark versions, not just Spark 4.0. Document
the behavior under native_datafusion limitations in scans.md and
cross-reference from the Spark 4.0 type widening section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant