Skip to content

branch-4.1: [fix](serde) Support large string arrow builder for variant serde#64048

Merged
yiguolei merged 1 commit into
apache:branch-4.1from
eldenmoon:codex/pick-63718-branch-4.1
Jun 3, 2026
Merged

branch-4.1: [fix](serde) Support large string arrow builder for variant serde#64048
yiguolei merged 1 commit into
apache:branch-4.1from
eldenmoon:codex/pick-63718-branch-4.1

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

cherry-pick #63718

### What problem does this PR solve?

Issue Number: None

Related PR: apache#63718

Problem Summary: Pick apache#63718 to branch-4.1. `DataTypeVariantSerDe::write_column_to_arrow` previously assumed an Arrow string builder. Parquet OUTFILE can use `large_utf8` batches and pass an Arrow large string builder, which could crash BE on a bad builder cast. Support both string and large string builders for Variant Arrow serialization, and report unsupported builders as invalid arguments.

### Release note

Fix BE crash when exporting VARIANT columns to Parquet OUTFILE with large Arrow string batches.

### Check List (For Author)

- Test: Unit Test
    - Unit Test: ./run-be-ut.sh --run --filter='DataTypeSerDeTest.VariantWriteColumnToArrowSupportsLargeString'
    - Unit Test: ./run-be-ut.sh --run --filter='DataTypeSerDeTest.*'
    - Format: PATH=/mnt/disk1/claude-max/ldb_toolchain16/bin:$PATH ./build-support/check-format.sh
    - Static check: git diff --check origin/branch-4.1...HEAD
- Behavior changed: Yes. VARIANT Arrow serialization now supports large_utf8 builders instead of aborting on a bad builder cast.
- Does this need documentation: No

(cherry picked from commit 290c014)
Copilot AI review requested due to automatic review settings June 3, 2026 03:44
@eldenmoon eldenmoon requested a review from yiguolei as a code owner June 3, 2026 03:44
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

1 similar comment
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR cherry-picks #63718 onto branch-4.1 to prevent BE crashes when serializing VARIANT columns to Arrow for Parquet OUTFILE exports. It extends DataTypeVariantSerDe::write_column_to_arrow to support both utf8 and large_utf8 Arrow builders, and adds a BE unit test covering the LargeStringBuilder path.

Changes:

  • Handle both arrow::StringBuilder and arrow::LargeStringBuilder in VARIANT Arrow serialization.
  • Refactor shared VARIANT→Arrow append logic into a templated helper.
  • Add a BE unit test verifying serialization into a LargeStringArray.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
be/src/core/data_type_serde/data_type_variant_serde.cpp Supports both STRING and LARGE_STRING Arrow builder types for VARIANT serialization and returns a clear error for unsupported builder types.
be/test/core/data_type_serde/data_type_serde_test.cpp Adds a unit test that constructs a VARIANT scalar and verifies writing via arrow::LargeStringBuilder.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yiguolei
Copy link
Copy Markdown
Contributor

yiguolei commented Jun 3, 2026

skip buildall

@yiguolei yiguolei merged commit 3ebad14 into apache:branch-4.1 Jun 3, 2026
32 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants