IN LIST: add direct-probe hash filter for large primitive lists by geoffreyclaude · Pull Request #23015 · apache/datafusion

geoffreyclaude · 2026-06-18T07:32:07Z

Which issue does this PR close?

Part of Further improve performance of IN list evaluation #19241.
Stacked on IN LIST: add branchless filter for small primitive lists #23014.
Extracted from Optimize IN performance with specialized implementations #19390.

Rationale for this change

#23014 handles tiny primitive IN lists by comparing against each constant. That stops being a good tradeoff once the list gets larger.

For larger primitive lists, this PR uses a purpose-built lookup table. The mental model is:

Precompute a table from the constants in x IN (...).
For each input value, compute a cheap table slot from the value.
Check that slot, and move forward if there was a collision.

This is still a hash-table style lookup, but it is simpler than the generic fallback because primitive values are fixed-width and can be stored directly. There is no need for the generic Arrow comparator path for each candidate.

The earlier bitmap and branchless filters remain in place for the cases where they are cheaper.

What changes are included in this PR?

Adds DirectProbeFilter, a compact open-addressing lookup table with linear probing.
Routes larger primitive IN lists to direct probing after the branchless thresholds.
Supports zero-copy same-width reinterpretation for compatible primitive types.
Avoids extra temporary value copies when building the table.
Keeps slice and null handling on the raw-buffer fast path.

Are these changes tested?

Yes.

cargo fmt --all --check
cargo test -p datafusion-physical-expr direct_probe --lib
cargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warnings

Are there any user-facing changes?

No. This is an internal performance optimization only.

Local benchmark snapshot

Benchmark command:

cargo bench -p datafusion-physical-expr --profile release-nonlto --bench in_list_strategy -- --save-baseline <name>

Method: compare adjacent saved baselines using raw Criterion sample minima (min(time / iters)). Lower is better; changes within +/-5% are treated as noise.

Compared baselines: #23014 -> #23015

Relevant scope: large primitive-list rows.

Summary: 13 relevant rows, 13 faster, 0 slower, 0 within +/-5%.

Benchmark	Before	After	Change
`f32/large_list/list=64/match=0%`	18.83 us	7.91 us	-58.0% (2.38x faster)
`f32/large_list/list=64/match=50%`	33.00 us	10.27 us	-68.9% (3.21x faster)
`nulls/primitive/i32/large_list/list=64/match=50%/nulls=20%`	25.79 us	11.26 us	-56.3% (2.29x faster)
`primitive/i32/large_list/list=256/match=0%`	17.80 us	7.99 us	-55.1% (2.23x faster)
`primitive/i32/large_list/list=256/match=50%`	27.31 us	10.25 us	-62.5% (2.67x faster)
`primitive/i32/large_list/list=64/match=0%`	18.15 us	8.05 us	-55.7% (2.26x faster)
`primitive/i32/large_list/list=64/match=50%`	27.85 us	10.25 us	-63.2% (2.72x faster)
`primitive/i64/large_list/list=128/match=0%`	18.93 us	8.00 us	-57.7% (2.37x faster)
`primitive/i64/large_list/list=128/match=50%`	24.24 us	10.08 us	-58.4% (2.41x faster)
`primitive/i64/large_list/list=32/match=0%`	19.82 us	8.51 us	-57.1% (2.33x faster)
`primitive/i64/large_list/list=32/match=50%`	26.01 us	11.31 us	-56.5% (2.30x faster)
`timestamp_ns/large_list/list=32/match=0%`	19.38 us	8.49 us	-56.2% (2.28x faster)
`timestamp_ns/large_list/list=32/match=50%`	45.82 us	10.03 us	-78.1% (4.57x faster)

Replaces HashSet<u8> with a 32-byte stack-allocated bitmap. Provides O(1) membership testing via bit-shifting, significantly reducing memory overhead and improving cache locality. Triggers for UInt8 arrays.

Implements an 8 KB heap-allocated bitmap for UInt16. Maintains O(1) performance while handling the larger value space. Triggers for UInt16 arrays.

Introduces zero-copy buffer reinterpretation to allow signed integers and other 1 or 2-byte primitive types (e.g. Float16) to use the high-performance bitmap filters. Triggers for all types with 1-byte or 2-byte width.

Adds a const-generic unrolled comparison chain that avoids CPU branching. Outperforms hash lookups for very small lists. Triggers for primitives when list size <= 32 (4-byte), 16 (8-byte), or 4 (16-byte).

Implements a fast hash table using open addressing with linear probing and a 25% load factor. Replaces the legacy HashSet for primitives, reducing indirection. Triggers for primitives when list size exceeds branchless thresholds.

geoffreyclaude added 3 commits June 18, 2026 08:30

Refactor generic InList static filter helpers

e31fafe

Build InList results from bitmaps

afc196b

Optimize generic InList static filtering

a84579d

github-actions Bot added the physical-expr Changes to the physical-expr crates label Jun 18, 2026

This was referenced Jun 18, 2026

IN LIST: add string-view filters for Utf8View and BinaryView #23016

Draft

Further improve performance of IN list evaluation #19241

Open

Implement Bitmap Filter for UInt8 (Stack-based)

b910c6a

Replaces HashSet<u8> with a 32-byte stack-allocated bitmap. Provides O(1) membership testing via bit-shifting, significantly reducing memory overhead and improving cache locality. Triggers for UInt8 arrays.

geoffreyclaude force-pushed the perf/in_list_direct_probe_filter branch from c2625de to 12ca843 Compare June 18, 2026 08:22

geoffreyclaude added 4 commits June 18, 2026 10:40

Extend Bitmap Filter to UInt16 (Heap-based)

81ec379

Implements an 8 KB heap-allocated bitmap for UInt16. Maintains O(1) performance while handling the larger value space. Triggers for UInt16 arrays.

Implement Zero-Copy Reinterpretation and enable Int8/Int16 Bitmaps

9925e82

Introduces zero-copy buffer reinterpretation to allow signed integers and other 1 or 2-byte primitive types (e.g. Float16) to use the high-performance bitmap filters. Triggers for all types with 1-byte or 2-byte width.

Implement Branchless Filter for small primitive lists

eae4046

Adds a const-generic unrolled comparison chain that avoids CPU branching. Outperforms hash lookups for very small lists. Triggers for primitives when list size <= 32 (4-byte), 16 (8-byte), or 4 (16-byte).

geoffreyclaude force-pushed the perf/in_list_direct_probe_filter branch from 12ca843 to 0111ce5 Compare June 18, 2026 09:12

geoffreyclaude changed the title ~~Implement Direct Probe (Hash) Filter for large primitive lists~~ IN LIST: add direct-probe hash filter for large primitive lists Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IN LIST: add direct-probe hash filter for large primitive lists#23015

IN LIST: add direct-probe hash filter for large primitive lists#23015
geoffreyclaude wants to merge 8 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_direct_probe_filter

geoffreyclaude commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

geoffreyclaude commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Local benchmark snapshot

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

geoffreyclaude commented Jun 18, 2026 •

edited

Loading