bench(spatialbench): SpatialBench Q1 on DataFusion-over-Vortex by HarukiMoriarty · Pull Request #8456 · vortex-data/vortex

HarukiMoriarty · 2026-06-16T21:47:33Z

Status:

temporary fork dependency, so the target branch is not develop

Summary

A SpatialBench suite (Apache Sedona's geospatial ride-sharing benchmark) built on top of DataFusion. It runs Q1 four ways, {parquet, vortex} × {wkb, native}, to show where the speedup comes from.

Query: Q1 — trips within a radius of a point, ranked by distance.
Wired: the trip table + Q1; dimension tables and other queries are catalog-ready, not yet wired.
Datasets: wkb.rs generates the canonical WKB base; native.rs derives the native encodings by decoding WKB → GeoArrow points in Arrow (geoarrow_cast, so Vortex carries no WKB code), writing a Vortex Point file and a GeoParquet file.
Some glues.

Results (Q1, scale factor 1, release)

	Geometry storage	Compute path	Time	vs fastest
vortex-native	native `Point` extension (columnar x/y)	`GeoDistance` fused into the Vortex scan	5.7 ms	1.0×
vortex-wkb	WKB bytes, Vortex format	geodatafusion above the scan (no pushdown)	46 ms	8.1×
parquet-native	GeoParquet `geoarrow.point`	geodatafusion per-row UDF, filter in parquet scan	52 ms	9.1×
parquet-wkb	WKB bytes, Parquet	WKB parse + geodatafusion per-row above the scan UDF	103 ms	18×

Why the others are slow

parquet-wkb (103 ms): the geometry is stored as raw WKB bytes, and the query wraps it in ST_GeomFromWKB(pickup). DataFusion pushes the radius filter down into the parquet reader as a row-filter, but to actually evaluate it the reader has to decode every WKB blob into a geometry and then call geodatafusion's ST_Distance UDF once per row. Both of those costs grow with the number of rows, which is why it's the slowest.

parquet-native (52 ms): GeoParquet stores the points natively as geoarrow.point, and skip_metadata=false keeps that extension so geodatafusion recognizes the column as a geometry, so the WKB-parsing cost disappears. But the distance is still computed by geodatafusion's per-row UDF: Parquet has no geo kernel inside the scan, so even with filter pushdown it still materializes the points into Arrow and dispatches the UDF row by row. Dropping the parse is why it beats parquet-wkb, but the per-row compute keeps it well behind Vortex.

vortex-wkb (46 ms): this is Vortex without the pushdown. The predicate can't lower into the scan because its operand is ST_GeomFromWKB(t_pickuploc), which the converter doesn't recognize as something it can translate, so the whole filter stays above the scan and geodatafusion computes it. The columnar format still reads faster than Parquet, but the geo work happens above the scan.

vortex-native (5.7 ms): both blockers are gone. With points=native the query drops the ST_GeomFromWKB wrapper, so the operand is just a plain Point column the converter can translate, and the column itself is the native Point extension that GeoDistance knows how to compute over. The distance is fused right into the scan and evaluated over the columnar x and y as they're read, with the filter applied in-scan, so there's no WKB parsing and no UDF round-trip, and only the handful of matching rows ever materialize.

Signed-off-by: Nemo Yu <zyu379@wisc.edu>

codspeed-hq · 2026-06-16T21:49:38Z

Merging this PR will improve performance by 18.72%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 116 improved benchmarks
❌ 14 regressed benchmarks
✅ 1398 untouched benchmarks
⏩ 27 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`bitwise_and_vortex_buffer[65536]`	15.1 µs	24.3 µs	-37.73%
❌	Simulation	`bitwise_or_vortex_buffer[65536]`	15.2 µs	24.3 µs	-37.41%
❌	Simulation	`bitwise_and_vortex_buffer[2048]`	2.9 µs	4.4 µs	-33.69%
❌	Simulation	`bitwise_and_vortex_buffer[1024]`	2.8 µs	4.1 µs	-32.46%
❌	Simulation	`bitwise_and_vortex_buffer[16384]`	6.7 µs	9.9 µs	-31.97%
❌	Simulation	`bitwise_or_vortex_buffer[1024]`	2.8 µs	4 µs	-31.48%
❌	Simulation	`bitwise_or_vortex_buffer[2048]`	3 µs	4.3 µs	-31.45%
❌	Simulation	`bitwise_or_vortex_buffer[16384]`	6.8 µs	9.9 µs	-31.28%
❌	Simulation	`bitwise_and_vortex_buffer[128]`	3.5 µs	4.7 µs	-26.53%
❌	Simulation	`bitwise_or_vortex_buffer[128]`	3.5 µs	4.7 µs	-25.45%
❌	Simulation	`bitwise_not_vortex_buffer_mut[128]`	215.3 ns	273.6 ns	-21.32%
❌	Simulation	`bitwise_not_vortex_buffer_mut[1024]`	275.6 ns	333.9 ns	-17.47%
❌	Simulation	`bitwise_not_vortex_buffer_mut[2048]`	398.6 ns	456.9 ns	-12.77%
❌	Simulation	`encode_varbin[(1000, 2)]`	158 µs	177 µs	-10.75%
⚡	Simulation	`compare[48]`	300.9 µs	213 µs	+41.29%
⚡	Simulation	`compare[50]`	319.4 µs	227.8 µs	+40.23%
⚡	Simulation	`compare[49]`	318 µs	228.2 µs	+39.34%
⚡	Simulation	`baseline_lt[16, 65536]`	304.4 µs	219.3 µs	+38.81%
⚡	Simulation	`compare[44]`	287.9 µs	207.5 µs	+38.72%
⚡	Simulation	`compare[46]`	302.8 µs	218.6 µs	+38.55%
...	...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing nemo/spatialgeo-benchmark-q1 (5638511) with nemo/spatialbench-q1 (d842e32)}

27 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

end to end q1

5638511

Signed-off-by: Nemo Yu <zyu379@wisc.edu>

HarukiMoriarty force-pushed the nemo/spatialgeo-benchmark-q1 branch from 8083787 to 5638511 Compare June 16, 2026 21:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(spatialbench): SpatialBench Q1 on DataFusion-over-Vortex#8456

bench(spatialbench): SpatialBench Q1 on DataFusion-over-Vortex#8456
HarukiMoriarty wants to merge 1 commit into
nemo/spatialbench-q1from
nemo/spatialgeo-benchmark-q1

HarukiMoriarty commented Jun 16, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HarukiMoriarty commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Status:

Summary

Results (Q1, scale factor 1, release)

Why the others are slow

Uh oh!

codspeed-hq Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 18.72%

Performance Changes

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HarukiMoriarty commented Jun 16, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 16, 2026 •

edited

Loading