Skip to content

Benchmarks Website Version 3#7643

Merged
connortsui20 merged 1 commit into
developfrom
ct/benchmarks-v3
May 4, 2026
Merged

Benchmarks Website Version 3#7643
connortsui20 merged 1 commit into
developfrom
ct/benchmarks-v3

Conversation

@connortsui20
Copy link
Copy Markdown
Contributor

@connortsui20 connortsui20 commented Apr 26, 2026

Summary

Rewrites the benchmarks website. Replaces the static data.json.gz model with a single Rust server binary that owns a DuckDB database and accepts POST /api/ingest from CI.

Design

  • Single binary: axum + maud (SSR HTML) + DuckDB + Chart.js. All static assets include_bytes!'d.
  • 5 fact tables (compression time, query measurement, vector search, RAG, random access). Backup is a file copy.
  • Ingest: versioned JSON envelopes, bearer-token gated.
  • Migrator ports v2 history forward via a classifier that routes each record to a fact table or skips it with a typed reason.
  • Charts/groups slug-addressed, URL round-trip with no DB lookup.
  • Routes: /, /chart/{slug}, /group/{slug}, GET /api/chart/{slug}.
  • Deploy: one binary, one DuckDB file, one INGEST_BEARER_TOKEN.

UI/UX is still TBD — the relational backend opens up options we didn't have before.

@connortsui20 connortsui20 added the changelog/skip Do not list PR in the changelog label Apr 26, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 26, 2026

Merging this PR will degrade performance by 26.23%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 5 improved benchmarks
❌ 3 regressed benchmarks
✅ 1161 untouched benchmarks
⏩ 138 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation patched_take_10k_first_chunk_only 302.4 µs 272 µs +11.2%
Simulation take_10k_dispersed 284.8 µs 239.7 µs +18.8%
Simulation patched_take_10k_adversarial 259 µs 228.6 µs +13.32%
Simulation patched_take_10k_dispersed 316 µs 285.5 µs +10.69%
Simulation take_10k_first_chunk_only 270.8 µs 225.8 µs +19.92%
Simulation bitwise_not_vortex_buffer_mut[1024] 307.8 ns 366.1 ns -15.93%
Simulation bitwise_not_vortex_buffer_mut[2048] 371.4 ns 429.7 ns -13.57%
Simulation bitwise_not_vortex_buffer_mut[128] 246.1 ns 333.6 ns -26.23%

Comparing ct/benchmarks-v3 (fbb0b39) with develop (d3ff1f1)

Open in CodSpeed

Footnotes

  1. 138 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@connortsui20 connortsui20 force-pushed the ct/benchmarks-v3 branch 7 times, most recently from 6444858 to 7113963 Compare April 29, 2026 21:05
connortsui20 added a commit that referenced this pull request May 4, 2026
…erver (#7780)

## Summary

Prototype website:
http://ec2-18-219-54-101.us-east-2.compute.amazonaws.com:3000/

This is the first step we should make before we cut over to the new
benchmarks website on #7643

This PR allows the CI actions to additionally post data to a server (on
my EC2 instance for now). We want to check that this actually works
before we start using this for all of our CI.

Note that this does NOT change how the current benchmarks website works,
as this just does a few extra things on top of that.

Also for reviewers, even though this looks like 1k LoC I think the logic
here is not that hard to review, a lot of this is boilerplate you can
skim over.

Below is a bunch of AI-generated description: read at your own
discretion.

<details>

Brings the v3 emitter and CI dual-write plumbing from `ct/benchmarks-v3`
onto `develop` **without** the v3 server/website code. CI continues to
write v2 results to S3 unchanged; v3 ingest is a side channel that
no-ops until the deploy track sets `vars.V3_INGEST_URL`.

This is item 2 ("CI ingestion wiring") of the v3 production-readiness
checklist in
[`benchmarks-website/planning/README.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/README.md).
The v3 website itself ships in a separate PR off `ct/benchmarks-v3` once
dual-write is verified healthy in production.

### What's included

**Rust emitter (`vortex-bench`)**
- New `vortex-bench/src/v3.rs`: one record per `kind`
(`query_measurement`, `compression_time`, `compression_size`,
`random_access_time`, `vector_search_run`) plus a serde-tagged
`V3Record` enum, JSONL writer, and `insta` snapshot tests. Field shapes
match
[`02-contracts.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/02-contracts.md).
- `Dataset::v3_dataset_dims()` (default `(name(), None)`) lets Public-BI
map to `(public-bi, <subset>)`.
- `compress` and `runner` capture per-iteration timings and provide
`SqlBenchmarkRunner::v3_records()`.

**Benchmark binaries**
- `compress-bench`, `datafusion-bench`, `duckdb-bench`, `lance-bench`,
`random-access-bench`, `vector-search-bench` all gain `--gh-json-v3
<path>`. Bare records, no envelope. The legacy `-d gh-json -o ...` flow
is untouched.

**`bench-orchestrator`**
- `vx-bench run --gh-json-v3 <path>` plumbs the flag through to the
underlying benchmark binary.

**`scripts/post-ingest.py`** (Python 3, stdlib only)
- Reads JSONL, fills the `commit` envelope from `git show`, wraps in
`{run_meta, commit, records}`, POSTs to `/api/ingest` with
`Authorization: Bearer ${INGEST_BEARER_TOKEN}`. Exits non-zero on
4xx/5xx. No retry/spool — deferred.

**Workflows**
- `.github/workflows/bench.yml` and `sql-benchmarks.yml` add
`--gh-json-v3 results.v3.jsonl` to the bench runs and a follow-up
"Ingest results to v3 server" step.
- New `.github/workflows/v3-commit-metadata.yml` POSTs an empty envelope
on every push to `develop` so the v3 `commits` dim stays populated even
when no benchmark ran.

### What's NOT included (intentionally)

- Anything under `benchmarks-website/` — the v2 React/Node app stays in
production unchanged.
- Workspace member additions for `benchmarks-website/server` and
`benchmarks-website/migrate` — those crates don't exist on `develop`
yet.
- `.github/workflows/ci.yml` and `publish-bench-server.yml` changes —
they reference `vortex-bench-server`, which is also v3-server-only.

## Risk

**Zero.** The v3 ingest step is gated on `vars.V3_INGEST_URL != ''` and
`continue-on-error: true`. If the V3 server is down, the variable is
unset, or the bearer secret is missing, the workflow no-ops and the v2
path keeps writing to S3 unchanged. The Rust emitter writes JSONL to a
local file only; no network egress from the binaries themselves.

## Verify

A CI run on this branch should show the new "Ingest results to v3
server" step running and POSTing successfully to the EC2 host at
`vars.V3_INGEST_URL`.

## Follow-up

The v3 website itself (server, migrator, web UI) ships in a separate PR
off `ct/benchmarks-v3` once dual-write is verified healthy in
production. Outbox-style retry on failed POSTs is also a follow-up — not
built until we observe a failure in the wild.

## Test plan

- [x] `cargo build -p vortex-bench` — clean.
- [x] `cargo nextest run -p vortex-bench` — 49/49 pass, including 7 new
v3 snapshot tests.
- [x] `cargo build -p compress-bench -p datafusion-bench -p duckdb-bench
-p lance-bench -p random-access-bench -p vector-search-bench` — clean.
- [x] All six benchmark binaries print `--gh-json-v3 <GH_JSON_V3>` in
`--help`.
- [x] `python3 scripts/post-ingest.py --help` — clean.
- [x] `pytest bench-orchestrator/tests/test_executor.py` — 5/5 pass,
including 2 new `gh_json_v3` tests.
- [x] `cargo +nightly fmt --all` — no diff.
- [x] `cargo clippy --all-targets --all-features -p vortex-bench` —
clean.
- [x] `cargo clippy --all-targets -p compress-bench -p datafusion-bench
-p lance-bench -p random-access-bench -p vector-search-bench` — clean.
`duckdb-bench` skipped (transitively triggers a pre-existing
`cognitive_complexity` lint in `vortex-duckdb/src/convert/expr.rs:47`,
present on `develop` and unrelated to these changes).
- [x] `yamllint --strict -c .yamllint.yaml` on the three changed/new
workflow files — clean.
- [x] `./scripts/public-api.sh` — N/A. All touched Rust crates have
`publish = false`.
- [ ] Real round-trip against the EC2 host — verifies once this branch
triggers a CI bench run with `V3_INGEST_URL` set.

---
_Generated by [Claude
Code](https://claude.ai/code/session_0154XbxhgQztmbrQfJ4ZSxVo)_

</details>

---------

Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Rewrites the benchmarks website. Replaces the static `data.json.gz` model
with a single Rust server binary that owns a DuckDB database and accepts
`POST /api/ingest` from CI.

Design:
- Single binary: axum + maud (SSR HTML) + DuckDB + Chart.js. All static
  assets `include_bytes!`'d.
- 5 fact tables (compression time, query measurement, vector search, RAG,
  random access). Backup is a file copy.
- Ingest: versioned JSON envelopes, bearer-token gated.
- Migrator ports v2 history forward via a classifier that routes each
  record to a fact table or skips it with a typed reason.
- Charts/groups slug-addressed, URL round-trip with no DB lookup.
- Routes: `/`, `/chart/{slug}`, `/group/{slug}`, `GET /api/chart/{slug}`.
- Deploy: one binary, one DuckDB file, one `INGEST_BEARER_TOKEN`.

Signed-off-by: Claude <noreply@anthropic.com>
@lwwmanning lwwmanning marked this pull request as ready for review May 4, 2026 22:17
@connortsui20 connortsui20 enabled auto-merge (squash) May 4, 2026 22:20
@connortsui20 connortsui20 merged commit e0a2bdf into develop May 4, 2026
60 of 62 checks passed
@connortsui20 connortsui20 deleted the ct/benchmarks-v3 branch May 4, 2026 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/skip Do not list PR in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants