Migrate to new stats falsification rules#8345
Conversation
## Summary Adds docs for `StatsRewriteRule` and its functions. Can be considered a follow-up for #8345, but can be individually merged. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
033c73f to
5ed375f
Compare
Port file pruning to session stats rewrites Signed-off-by: "Nicholas Gates" <nick@nickgates.com> Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com> Signed-off-by: Nicholas Gates <nick@nickgates.com>
5ed375f to
e8dd011
Compare
Merging this PR will improve performance by 22.84%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
35.6 µs | 20.5 µs | +73.69% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
213.1 µs | 176.5 µs | +20.71% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
215.3 ns | 186.1 ns | +15.67% |
| ⚡ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
308.8 µs | 273.1 µs | +13.08% |
| ⚡ | Simulation | encode_varbin[(1000, 2)] |
176.7 µs | 157.7 µs | +12.05% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
275.6 ns | 246.4 ns | +11.84% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ngates/public-stats-rewrite-rules (118c995) with develop (7dbe5ac)
Footnotes
-
10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Signed-off-by: "Nicholas Gates" <nick@nickgates.com> Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com> Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com> Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
| use crate::scalar::Scalar; | ||
| use crate::scalar_fn::fns::stat::StatFn; | ||
|
|
||
| /// A target that can bind abstract statistics to concrete expressions. |
There was a problem hiding this comment.
What does it mean to bind a statistic to an expression? What makes statistics abstract?
There was a problem hiding this comment.
What does it mean to bind a statistic to an expression? What makes statistics abstract?
Yeah stats stuff I am intermingling with an expressions change, so I think the "stat" expression becomes explicitly an "Expression::Placeholder", meaning it must be replaced prior to execution.
It's the same logic as our StatCatalog. Except the current StatCatalog means you have to re-run falsification over the entire expression any time your stats come from a different place, e.g. FileStats vs ZoneMap vs ArrayStats.
Here you take the falsified expression, then "bind" the stats from wherever you get them from.
| @@ -115,14 +119,52 @@ impl FileStatsLayoutReader { | |||
| Ok(result.as_bool().value() == Some(true)) | |||
| refs | ||
| } | ||
|
|
||
| fn collect_referenced_stat_field_names(expr: &Expression, refs: &mut HashSet<FieldName>) { |
There was a problem hiding this comment.
we have Node implemented for Expression, you should be able to visit the tree using that instead of hand-rolling the recursion
| } | ||
| } | ||
|
|
||
| fn bool_literal(expr: &Expression) -> Option<Option<bool>> { |
| return Ok(None); | ||
| }; | ||
| let required_stats = filter_required_stats(&lowered, binder.required_stats); | ||
| if required_stats.map().is_empty() && !matches!(bool_literal(&lowered), Some(Some(true))) { |
There was a problem hiding this comment.
maybe add a comment explaining this if statement?
| available_stats, | ||
| required_stats: Relation::new(), | ||
| }; | ||
| let Some(lowered) = bind_stats(predicate, &mut binder)? else { |
There was a problem hiding this comment.
Maybe I'm just stupid - but I think both "lowering" and "bindings" are worth defining and explaining somewhere.
|
I think I mostly have a lot of questions 😅 |
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.060x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (1.060x ➖, 0↑ 4↓)
No file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.922x ➖, 5↑ 0↓)
datafusion / vortex-compact (0.923x ➖, 3↑ 0↓)
datafusion / parquet (0.925x ➖, 4↑ 0↓)
datafusion / arrow (0.911x ➖, 8↑ 0↓)
duckdb / vortex-file-compressed (0.945x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.945x ➖, 1↑ 0↓)
duckdb / parquet (0.946x ➖, 5↑ 0↓)
duckdb / duckdb (0.945x ➖, 2↑ 0↓)
File Size Changes (10 files changed, -0.1% overall, 4↑ 6↓)
Totals:
|
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.938x ➖, 2↑ 0↓)
datafusion / vortex-compact (0.963x ➖, 1↑ 0↓)
datafusion / parquet (0.967x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.970x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.022x ➖, 1↑ 2↓)
duckdb / parquet (0.975x ➖, 1↑ 0↓)
File Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.017x ➖, 3↑ 4↓)
datafusion / vortex-compact (1.015x ➖, 2↑ 1↓)
datafusion / parquet (1.016x ➖, 3↑ 3↓)
duckdb / vortex-file-compressed (1.030x ➖, 0↑ 7↓)
duckdb / vortex-compact (1.025x ➖, 0↑ 3↓)
duckdb / parquet (1.013x ➖, 2↑ 0↓)
duckdb / duckdb (1.005x ➖, 1↑ 2↓)
File Size Changes (6 files changed, +0.0% overall, 3↑ 3↓)
Totals:
|
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (0.971x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.975x ➖, 0↑ 0↓)
duckdb / parquet (0.955x ➖, 0↑ 0↓)
File Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.795x ➖, 2↑ 0↓)
datafusion / vortex-compact (1.199x ➖, 0↑ 3↓)
datafusion / parquet (0.953x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.071x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.051x ➖, 0↑ 2↓)
duckdb / parquet (1.078x ➖, 0↑ 0↓)
|
Benchmarks: Random AccessVortex (geomean): 0.946x ➖ How to read Verdict and Engines
unknown / unknown (1.005x ➖, 2↑ 1↓)
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.059x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.050x ➖, 0↑ 1↓)
datafusion / parquet (1.047x ➖, 0↑ 0↓)
datafusion / arrow (1.067x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.034x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.032x ➖, 0↑ 0↓)
duckdb / parquet (1.023x ➖, 0↑ 1↓)
duckdb / duckdb (1.018x ➖, 0↑ 0↓)
File Size Changes (27 files changed, +0.0% overall, 13↑ 14↓)
Totals:
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.932x ➖, 7↑ 0↓)
datafusion / parquet (0.935x ➖, 6↑ 1↓)
duckdb / vortex-file-compressed (0.952x ➖, 6↑ 2↓)
duckdb / parquet (0.973x ➖, 0↑ 0↓)
duckdb / duckdb (0.966x ➖, 2↑ 0↓)
File Size Changes (106 files changed, +0.0% overall, 51↑ 55↓)
Totals:
|
Benchmarks: CompressionVortex (geomean): 0.997x ➖ How to read Verdict and Engines
unknown / unknown (0.977x ➖, 9↑ 1↓)
|
Benchmarks: Appian on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.001x ➖, 0↑ 0↓)
datafusion / parquet (1.010x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.998x ➖, 0↑ 0↓)
duckdb / parquet (0.997x ➖, 0↑ 0↓)
duckdb / duckdb (0.992x ➖, 0↑ 0↓)
File Size Changes (4 files changed, -0.0% overall, 2↑ 2↓)
Totals:
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.203x ➖, 0↑ 8↓)
datafusion / vortex-compact (1.071x ➖, 0↑ 4↓)
datafusion / parquet (1.062x ➖, 4↑ 8↓)
duckdb / vortex-file-compressed (0.948x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.020x ➖, 0↑ 0↓)
duckdb / parquet (1.045x ➖, 0↑ 0↓)
|
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Switch over to using the new rewrite rule registry from the session, instead of using the scalar fn vtable.