docs: Improve Spark version compatibility & ANSI mode documentation [WIP] by andygrove · Pull Request #4079 · apache/datafusion-comet

andygrove · 2026-04-25T13:47:00Z

Which issue does this PR close?

Part of #1637

Rationale for this change

There has been a lot of progress recently with Spark 4 and ANSI support. The docs now need some updates.

What changes are included in this PR?

Add new Spark Versions page to compatibility guide
Various minor updates

How are these changes tested?

N/A

…ording Add a dedicated Spark Version Compatibility page that documents the known Spark 4.0 gaps (VariantType, Parquet type widening) and notes that ANSI mode has good coverage with fallback for unsupported cases. Wire the page into the compatibility toctree. Move Spark 4.0.1 into the main supported-versions table in the installation guide and replace the "experimental, not for production" paragraph with a positive note that links to the new compatibility page. Drop "(Experimental)" from the Spark 4.0 JAR link.

Move the Spark 3.4/3.5 section before Spark 4.0 in the new compatibility page. Correct the description of Spark 4.0 gaps: variant columns fall back to Spark, but unsupported Parquet type widening is not detected as a fallback condition and may return incorrect results.

Issue apache#313 is closed, but Sum/Average aggregates and some Cast expressions still fall back to Spark in ANSI mode. Update the Spark Version Compatibility page to name those cases and drop the epic link from cast.md.

Sum and Average pass eval_mode through to the native accumulators (sum_int, sum_decimal, avg_decimal), which handle ANSI overflow themselves. The remaining ANSI-mode fallbacks are in Cast.

Add notes for two additional Spark 4.0 fallback paths: non-default string collations (group-by, distinct, sort, join, shuffle) and DataSource V2 bucketing with partially clustered distribution.

The native_datafusion scan silently accepts schema mismatches (issue apache#3720) on all supported Spark versions, not just Spark 4.0. Document the behavior under native_datafusion limitations in scans.md and cross-reference from the Spark 4.0 type widening section.

andygrove changed the title ~~docs: add Spark version compatibility guide and soften experimental wording~~ docs: add Spark version compatibility guide and soften experimental wording [WIP] Apr 25, 2026

andygrove added 2 commits April 25, 2026 07:49

andygrove changed the title ~~docs: add Spark version compatibility guide and soften experimental wording [WIP]~~ docs: Improve Spark 4 & ANSI mode documentation [WIP] Apr 25, 2026

docs: correct ANSI coverage note for sum and average

996fdc5

Sum and Average pass eval_mode through to the native accumulators (sum_int, sum_decimal, avg_decimal), which handle ANSI overflow themselves. The remaining ANSI-mode fallbacks are in Cast.

andygrove mentioned this pull request Apr 25, 2026

Track remaining Cast gaps in ANSI mode #4080

Closed

andygrove added 2 commits April 25, 2026 07:56

docs: drop redundant Sum/Average mention in ANSI note

c55967a

docs: document Spark 4 collation and V2 bucketing fallbacks

35f1f90

Add notes for two additional Spark 4.0 fallback paths: non-default string collations (group-by, distinct, sort, join, shuffle) and DataSource V2 bucketing with partially clustered distribution.

andygrove marked this pull request as ready for review April 25, 2026 14:03

andygrove marked this pull request as draft April 25, 2026 14:13

andygrove changed the title ~~docs: Improve Spark 4 & ANSI mode documentation [WIP]~~ docs: Improve Spark version compatibility & ANSI mode documentation [WIP] Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Improve Spark version compatibility & ANSI mode documentation [WIP]#4079

docs: Improve Spark version compatibility & ANSI mode documentation [WIP]#4079
andygrove wants to merge 7 commits intoapache:mainfrom
andygrove:docs/spark4-compatibility

andygrove commented Apr 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andygrove commented Apr 25, 2026 •

edited

Loading