[core] Support btree global index with embedded file metadata#7563
[core] Support btree global index with embedded file metadata#7563lilei1128 wants to merge 2 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Hi, thanks for this PR!
I'm just wondering that if it's necessary to reuse current BTree codebase? For example:
- in the btree index build topo, current implementation will decide the partition num by records number per range, and split ranges by partition, which may be not suitable for your case.
- And also, it seems that the
BTREE_WITH_FILE_METAoption will create a total different index type compared to BTree.
The "with-file-meta" is NOT a completely different index type. It's:
For first question, you're right that the parallelism logic is designed for key-index.
To skip manifest reads, we need two capabilities:
If we don't reuse BTree:
For capability 2, alternatives like manifest caching still require manifest That's why reusing BTree makes sense:
|
|
@lilei1128 |
| int partitionNum = Math.max((int) (range.count() / recordsPerRange), 1); | ||
| partitionNum = Math.min(partitionNum, maxParallelism); | ||
|
|
||
| // Pre-serialize ManifestEntries for file-meta index (if withFileMeta is enabled) |
There was a problem hiding this comment.
It seems that the key(i.e. the filename) is just used for deduplication? Can I image this index as actually a Range to Collection<ManifestEntry> index?
There was a problem hiding this comment.
Yes, currently fileName is mainly for deduplication; the runtime path does not do fileName point lookup yet.
This is an optimization point that follows
|
Hi @lilei1128 , thanks for the contribution! Do you have some benchmark on this PR? I am curious about the performance comparison between file meta based and rowid based in big data scenario. |
Hi, This is my test result on mac: Range queries perform better than point queries and the effect would be better if the data were on OSS/S3. |
|
@lilei1128 Your conclusion is too confusing to understand. Please only display the testing method and results. |
Test methodology summary Tables
Data load: Test result:
|
|
Thanks @lilei1128 for your benchmark, but I think more convincing grades are needed. If we only improve by this amount, it feels a bit uneconomical. |
Actually, if tested on OSS and with a sufficiently large volume of data, the effect should be quite noticeable. My test was only conducted on a MacBook with a relatively small amount of data. |

Purpose
Add with-file-meta option for btree global index to embed ManifestEntry
data directly in index files, enabling manifest-skip query planning.
Key changes:
manifest reads, with staleness detection via fileIO.exists()
When enabled, query planning reads only:
See: https://cwiki.apache.org/confluence/display/PAIMON/PIP-41%3A+Introduce+FilePath+Global+Index+And+Optimizations+For+Lookup+In+Append+Table
Tests
CI