Skip to content

IN LIST: add string-view filters for Utf8View and BinaryView#23016

Draft
geoffreyclaude wants to merge 9 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_string_view_filter
Draft

IN LIST: add string-view filters for Utf8View and BinaryView#23016
geoffreyclaude wants to merge 9 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_string_view_filter

Conversation

@geoffreyclaude

@geoffreyclaude geoffreyclaude commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

String and binary values are different from integers because comparing the full bytes can be more expensive. A string may be long, and many rows may not match at all.

Arrow's Utf8View and BinaryView layouts store useful summary information directly in each view value: the length and a prefix of the bytes. That gives us a cheap first question:

Could this value possibly match anything in the IN list?

If the length and prefix do not match, the answer is definitely no, and we avoid comparing the full value. If they do match, the value is only a candidate. Long values are then verified with an exact byte comparison before returning true.

For short Utf8View strings, the whole value fits inline in the view itself, so the primitive fast paths can be reused directly.

What changes are included in this PR?

  • Adds short-string Utf8View branchless/hash filters by viewing inline string views as 16-byte values.
  • Adds ByteViewMaskedFilter for mixed-length Utf8View and BinaryView arrays.
  • Uses length/prefix-style masked view values for fast rejection.
  • Performs exact full-byte verification for long candidates before reporting a match.
  • Adds a result-building helper that skips expensive membership checks for null needles.
  • Adds focused coverage for short sliced views and long-string prefix collisions.

Are these changes tested?

Yes.

  • cargo fmt --all --check
  • cargo test -p datafusion-physical-expr reinterpreted_ --lib
  • cargo test -p datafusion-physical-expr utf8view_hash_filter_handles_short_slices --lib
  • cargo test -p datafusion-physical-expr byte_view_masked_filter_verifies_long_string_matches --lib
  • cargo test -p datafusion-physical-expr in_list_string_types --lib
  • cargo test -p datafusion-physical-expr in_list_binary_types --lib
  • cargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warnings

Are there any user-facing changes?

No. This is an internal performance optimization only.

Local benchmark snapshot

Benchmark command:

cargo bench -p datafusion-physical-expr --profile release-nonlto --bench in_list_strategy -- --save-baseline <name>

Method: compare adjacent saved baselines using raw Criterion sample minima (min(time / iters)). Lower is better; changes within +/-5% are treated as noise.

Compared baselines: #23015 -> #23016

Relevant scope: Utf8View and nullable Utf8View rows.

Summary: 34 relevant rows, 32 faster, 2 slower, 0 within +/-5%.

Largest relevant deltas:

Benchmark Before After Change
nulls/utf8view/short_8b/list=16/match=50%/nulls=20% 61.49 us 12.51 us -79.6% (4.91x faster)
nulls/utf8view/short_8b/list=16/match=50%/nulls=50% 60.93 us 13.00 us -78.7% (4.69x faster)
nulls/utf8view/short_8b/list=16/match=50%/nulls=20%/NOT_IN 58.66 us 13.08 us -77.7% (4.49x faster)
utf8view/short_8b/list=256/match=50% 48.84 us 11.56 us -76.3% (4.23x faster)
utf8view/short_8b/list=16/match=50% 48.24 us 11.49 us -76.2% (4.20x faster)
utf8view/len_12b/list=16/match=50% 48.64 us 11.74 us -75.9% (4.14x faster)
utf8view/len_12b/list=64/match=50% 47.46 us 11.53 us -75.7% (4.12x faster)
utf8view/short_8b/list=64/match=50% 47.34 us 12.20 us -74.2% (3.88x faster)
utf8view/short_8b/list=4/match=50% 45.63 us 12.19 us -73.3% (3.74x faster)
utf8view/mixed_len/list=16/match=50% 98.48 us 27.32 us -72.3% (3.60x faster)
utf8view/shared_prefix/pfx=12/list=32/match=0% 38.92 us 11.61 us -70.2% (3.35x faster)
utf8view/shared_prefix/pfx=16/list=64/match=0% 38.46 us 11.68 us -69.6% (3.29x faster)
utf8view/mixed_len/list=64/match=0% 41.15 us 13.53 us -67.1% (3.04x faster)
utf8view/mixed_len/list=16/match=0% 40.41 us 13.59 us -66.4% (2.97x faster)
utf8view/long_24b/list=4/match=0% 37.46 us 13.03 us -65.2% (2.87x faster)
Full relevant table (34 rows)
Benchmark Before After Change
nulls/utf8view/long_24b/list=16/match=50%/nulls=20% 87.80 us 64.71 us -26.3% (1.36x faster)
nulls/utf8view/short_8b/list=16/match=50%/nulls=20% 61.49 us 12.51 us -79.6% (4.91x faster)
nulls/utf8view/short_8b/list=16/match=50%/nulls=20%/NOT_IN 58.66 us 13.08 us -77.7% (4.49x faster)
nulls/utf8view/short_8b/list=16/match=50%/nulls=50% 60.93 us 13.00 us -78.7% (4.69x faster)
utf8view/len_12b/list=16/match=0% 20.27 us 10.97 us -45.9% (1.85x faster)
utf8view/len_12b/list=16/match=50% 48.64 us 11.74 us -75.9% (4.14x faster)
utf8view/len_12b/list=64/match=0% 20.43 us 10.72 us -47.5% (1.91x faster)
utf8view/len_12b/list=64/match=50% 47.46 us 11.53 us -75.7% (4.12x faster)
utf8view/long_24b/list=16/match=0% 37.39 us 14.08 us -62.3% (2.65x faster)
utf8view/long_24b/list=16/match=50% 87.66 us 74.12 us -15.5% (1.18x faster)
utf8view/long_24b/list=256/match=0% 38.06 us 18.09 us -52.5% (2.10x faster)
utf8view/long_24b/list=256/match=50% 88.59 us 97.05 us +9.5% (1.10x slower)
utf8view/long_24b/list=4/match=0% 37.46 us 13.03 us -65.2% (2.87x faster)
utf8view/long_24b/list=4/match=50% 86.98 us 74.40 us -14.5% (1.17x faster)
utf8view/long_24b/list=64/match=0% 38.03 us 19.04 us -49.9% (2.00x faster)
utf8view/long_24b/list=64/match=50% 87.42 us 97.52 us +11.6% (1.12x slower)
utf8view/mixed_len/list=16/match=0% 40.41 us 13.59 us -66.4% (2.97x faster)
utf8view/mixed_len/list=16/match=50% 98.48 us 27.32 us -72.3% (3.60x faster)
utf8view/mixed_len/list=64/match=0% 41.15 us 13.53 us -67.1% (3.04x faster)
utf8view/mixed_len/list=64/match=50% 111.64 us 43.85 us -60.7% (2.55x faster)
utf8view/shared_prefix/pfx=12/list=32/match=0% 38.92 us 11.61 us -70.2% (3.35x faster)
utf8view/shared_prefix/pfx=12/list=32/match=50% 91.20 us 74.16 us -18.7% (1.23x faster)
utf8view/shared_prefix/pfx=16/list=64/match=0% 38.46 us 11.68 us -69.6% (3.29x faster)
utf8view/shared_prefix/pfx=16/list=64/match=50% 88.41 us 76.78 us -13.2% (1.15x faster)
utf8view/shared_prefix/pfx=8/list=16/match=0% 30.97 us 11.68 us -62.3% (2.65x faster)
utf8view/shared_prefix/pfx=8/list=16/match=50% 79.36 us 63.75 us -19.7% (1.24x faster)
utf8view/short_8b/list=16/match=0% 20.58 us 10.78 us -47.6% (1.91x faster)
utf8view/short_8b/list=16/match=50% 48.24 us 11.49 us -76.2% (4.20x faster)
utf8view/short_8b/list=256/match=0% 20.90 us 10.82 us -48.2% (1.93x faster)
utf8view/short_8b/list=256/match=50% 48.84 us 11.56 us -76.3% (4.23x faster)
utf8view/short_8b/list=4/match=0% 20.30 us 12.15 us -40.1% (1.67x faster)
utf8view/short_8b/list=4/match=50% 45.63 us 12.19 us -73.3% (3.74x faster)
utf8view/short_8b/list=64/match=0% 20.82 us 11.11 us -46.6% (1.87x faster)
utf8view/short_8b/list=64/match=50% 47.34 us 12.20 us -74.2% (3.88x faster)

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Jun 18, 2026
@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 18, 2026
Replaces HashSet<u8> with a 32-byte stack-allocated bitmap. Provides O(1) membership testing via bit-shifting, significantly reducing memory overhead and improving cache locality. Triggers for UInt8 arrays.
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_string_view_filter branch from 2834ab2 to 34307af Compare June 18, 2026 08:25
Implements an 8 KB heap-allocated bitmap for UInt16. Maintains O(1) performance while handling the larger value space. Triggers for UInt16 arrays.
Introduces zero-copy buffer reinterpretation to allow signed integers and other 1 or 2-byte primitive types (e.g. Float16) to use the high-performance bitmap filters. Triggers for all types with 1-byte or 2-byte width.
Adds a const-generic unrolled comparison chain that avoids CPU branching. Outperforms hash lookups for very small lists. Triggers for primitives when list size <= 32 (4-byte), 16 (8-byte), or 4 (16-byte).
Implements a fast hash table using open addressing with linear probing and a 25% load factor. Replaces the legacy HashSet for primitives, reducing indirection. Triggers for primitives when list size exceeds branchless thresholds.
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_string_view_filter branch from 34307af to 620f5e3 Compare June 18, 2026 09:31
@github-actions github-actions Bot removed the auto detected api change Auto detected API change label Jun 18, 2026
Introduces a two-stage filter for ByteView types. Stage 1 uses a fast DirectProbeFilter on masked views (len + prefix) for quick rejection; Stage 2 performs full verification only for potential long-string matches. Triggers for Utf8View and BinaryView.
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_string_view_filter branch from 620f5e3 to 0adb66e Compare June 18, 2026 10:29
@geoffreyclaude geoffreyclaude changed the title Implement String View (Utf8View/BinaryView) Optimizations IN LIST: add string-view filters for Utf8View and BinaryView Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant