feat: support StringSplitSQL for split_part by michaelmitchell-bit · Pull Request #4592 · apache/datafusion-comet

michaelmitchell-bit · 2026-06-04T15:09:54Z

Which issue does this PR close?

Closes #4561.

Rationale for this change

split_part lowers to element_at(StringSplitSQL(str, delimiter), partNum). Comet already supports element_at, but the inner StringSplitSQL expression was unsupported, so split_part fell back to Spark.

What changes are included in this PR?

Adds native support for Spark 4.x StringSplitSQL via a literal-delimiter split_sql function. Unlike split / StringSplit, this splits on the delimiter literally rather than as a regex and keeps trailing empty parts.

This also fixes ListExtract default-value typing so defaults are cast to the array element type rather than the list type. That is needed for split_part, which uses an empty-string default through element_at.

Updates expression support docs and adds SQL-file coverage for split_part, including literal delimiters, dynamic delimiters, nulls, out-of-range defaults, negative positions, and invalid zero index behavior.

How are these changes tested?

Ran:

cargo fmt -p datafusion-comet-spark-expr
cargo test -p datafusion-comet-spark-expr string_funcs::split
cargo test -p datafusion-comet-spark-expr list_extract
cargo build
./mvnw test -Dtest=none -Dsuites="org.apache.comet.CometSqlFileTestSuite split_part" -Dscalastyle.skip=true
./mvnw compile -Pspark-4.0 -DskipTests -Dscalastyle.skip=true
./mvnw compile -Pspark-4.2 -DskipTests -Dscalastyle.skip=true
git diff --check

andygrove · 2026-06-04T21:26:10Z

Thanks @michaelmitchell-bit. This looks good overall.

StringSplitSQL is collation-aware in Spark and Comet does not support different collations yet, so it would be good to add a fallback for that case. See concat and reverse for examples.

michaelmitchell-bit · 2026-06-04T23:13:55Z

Thanks @michaelmitchell-bit. This looks good overall.

StringSplitSQL is collation-aware in Spark and Comet does not support different collations yet, so it would be good to add a fallback for that case. See concat and reverse for examples.

awesome, ty. added a fallback for non-utf8_binary collations and covered it in the split_part SQL file test.

feat: support StringSplitSQL for split_part

1b86767

fix: fall back for StringSplitSQL collations

a66ad75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support StringSplitSQL for split_part#4592

feat: support StringSplitSQL for split_part#4592
michaelmitchell-bit wants to merge 2 commits into
apache:mainfrom
michaelmitchell-bit:support-stringsplitsql-split-part-4561

michaelmitchell-bit commented Jun 4, 2026

Uh oh!

andygrove commented Jun 4, 2026

Uh oh!

michaelmitchell-bit commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

michaelmitchell-bit commented Jun 4, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove commented Jun 4, 2026

Uh oh!

michaelmitchell-bit commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants