IN LIST: add direct-probe hash filter for large primitive lists#23015
Draft
geoffreyclaude wants to merge 8 commits into
Draft
IN LIST: add direct-probe hash filter for large primitive lists#23015geoffreyclaude wants to merge 8 commits into
geoffreyclaude wants to merge 8 commits into
Conversation
This was referenced Jun 18, 2026
Replaces HashSet<u8> with a 32-byte stack-allocated bitmap. Provides O(1) membership testing via bit-shifting, significantly reducing memory overhead and improving cache locality. Triggers for UInt8 arrays.
c2625de to
12ca843
Compare
Implements an 8 KB heap-allocated bitmap for UInt16. Maintains O(1) performance while handling the larger value space. Triggers for UInt16 arrays.
Introduces zero-copy buffer reinterpretation to allow signed integers and other 1 or 2-byte primitive types (e.g. Float16) to use the high-performance bitmap filters. Triggers for all types with 1-byte or 2-byte width.
Adds a const-generic unrolled comparison chain that avoids CPU branching. Outperforms hash lookups for very small lists. Triggers for primitives when list size <= 32 (4-byte), 16 (8-byte), or 4 (16-byte).
Implements a fast hash table using open addressing with linear probing and a 25% load factor. Replaces the legacy HashSet for primitives, reducing indirection. Triggers for primitives when list size exceeds branchless thresholds.
12ca843 to
0111ce5
Compare
This was referenced Jun 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
INperformance with specialized implementations #19390.Rationale for this change
#23014 handles tiny primitive
INlists by comparing against each constant. That stops being a good tradeoff once the list gets larger.For larger primitive lists, this PR uses a purpose-built lookup table. The mental model is:
x IN (...).This is still a hash-table style lookup, but it is simpler than the generic fallback because primitive values are fixed-width and can be stored directly. There is no need for the generic Arrow comparator path for each candidate.
The earlier bitmap and branchless filters remain in place for the cases where they are cheaper.
What changes are included in this PR?
DirectProbeFilter, a compact open-addressing lookup table with linear probing.INlists to direct probing after the branchless thresholds.Are these changes tested?
Yes.
cargo fmt --all --checkcargo test -p datafusion-physical-expr direct_probe --libcargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warningsAre there any user-facing changes?
No. This is an internal performance optimization only.
Local benchmark snapshot
Benchmark command:
Method: compare adjacent saved baselines using raw Criterion sample minima (
min(time / iters)). Lower is better; changes within +/-5% are treated as noise.Compared baselines: #23014 -> #23015
Relevant scope: large primitive-list rows.
Summary: 13 relevant rows, 13 faster, 0 slower, 0 within +/-5%.
f32/large_list/list=64/match=0%f32/large_list/list=64/match=50%nulls/primitive/i32/large_list/list=64/match=50%/nulls=20%primitive/i32/large_list/list=256/match=0%primitive/i32/large_list/list=256/match=50%primitive/i32/large_list/list=64/match=0%primitive/i32/large_list/list=64/match=50%primitive/i64/large_list/list=128/match=0%primitive/i64/large_list/list=128/match=50%primitive/i64/large_list/list=32/match=0%primitive/i64/large_list/list=32/match=50%timestamp_ns/large_list/list=32/match=0%timestamp_ns/large_list/list=32/match=50%