branch-4.1: [fix](serde) Support large string arrow builder for variant serde#64048
Merged
Merged
Conversation
### What problem does this PR solve? Issue Number: None Related PR: apache#63718 Problem Summary: Pick apache#63718 to branch-4.1. `DataTypeVariantSerDe::write_column_to_arrow` previously assumed an Arrow string builder. Parquet OUTFILE can use `large_utf8` batches and pass an Arrow large string builder, which could crash BE on a bad builder cast. Support both string and large string builders for Variant Arrow serialization, and report unsupported builders as invalid arguments. ### Release note Fix BE crash when exporting VARIANT columns to Parquet OUTFILE with large Arrow string batches. ### Check List (For Author) - Test: Unit Test - Unit Test: ./run-be-ut.sh --run --filter='DataTypeSerDeTest.VariantWriteColumnToArrowSupportsLargeString' - Unit Test: ./run-be-ut.sh --run --filter='DataTypeSerDeTest.*' - Format: PATH=/mnt/disk1/claude-max/ldb_toolchain16/bin:$PATH ./build-support/check-format.sh - Static check: git diff --check origin/branch-4.1...HEAD - Behavior changed: Yes. VARIANT Arrow serialization now supports large_utf8 builders instead of aborting on a bad builder cast. - Does this need documentation: No (cherry picked from commit 290c014)
Member
Author
|
run buildall |
1 similar comment
Member
Author
|
run buildall |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR cherry-picks #63718 onto branch-4.1 to prevent BE crashes when serializing VARIANT columns to Arrow for Parquet OUTFILE exports. It extends DataTypeVariantSerDe::write_column_to_arrow to support both utf8 and large_utf8 Arrow builders, and adds a BE unit test covering the LargeStringBuilder path.
Changes:
- Handle both
arrow::StringBuilderandarrow::LargeStringBuilderin VARIANT Arrow serialization. - Refactor shared VARIANT→Arrow append logic into a templated helper.
- Add a BE unit test verifying serialization into a
LargeStringArray.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| be/src/core/data_type_serde/data_type_variant_serde.cpp | Supports both STRING and LARGE_STRING Arrow builder types for VARIANT serialization and returns a clear error for unsupported builder types. |
| be/test/core/data_type_serde/data_type_serde_test.cpp | Adds a unit test that constructs a VARIANT scalar and verifies writing via arrow::LargeStringBuilder. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
|
skip buildall |
yiguolei
approved these changes
Jun 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
cherry-pick #63718