feat: sequence cast compute by siddarth2810 · Pull Request #8403 · vortex-data/vortex

siddarth2810 · 2026-06-13T18:06:57Z

Summary

This PR updates SequenceArray to track the arithmetic ptype separately from the declared output dtype.

Previously, SequenceArray used the same ptype for both calculation and output. Cases like a negative-step signed sequence where all generated values still fit an unsigned output dtype failed.

This change stores calculation_ptype in the SequenceMetadata, validates generated values against the output dtype, and updates sequence paths to compute using calculation_ptype while emitting values using the array output ptype.

After casting, a SequenceArray can compute internally as i32 while exposing u8 as its array dtype. Added a test that catches scalar_at returning an i32 PValue inside a u8 scalar.

Closes: #5102

API Changes

Sequence::try_new and Sequence::new_unchecked now accept both calculation_ptype and output_ptype.

Testing

For testing code changes

cargo nextest run -p vortex-sequence
cargo +nightly fmt --all
cargo clippy --all-targets --all-features

For testing the build

cargo build --workspace

AI tools Disclosure

Used ChatGPT for understanding code and OpenCode for updating callers and generating tests

SequenceArray previously used the same ptype for arithmetic and for the declared output dtype, which made the model too narrow for casts. Store calculation_ptype separately from the output dtype, preserve it through metadata, and validate that generated values fit the declared output type. Update decompression, filter, take, slice, scalar access, and min/max paths to compute in calculation_ptype while emitting values using the array output ptype. Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>

Sequence::try_new now accepts both the calculation ptype and output ptype. Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>

connortsui20 · 2026-06-13T23:48:10Z

Don't worry about the rustsec issue, that is a known problem

codspeed-hq · 2026-06-13T23:51:12Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 4 improved benchmarks
❌ 2 regressed benchmarks
✅ 1539 untouched benchmarks
⏩ 10 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`decompress_rd[f64, (100000, 0.01)]`	845.9 µs	981.6 µs	-13.83%
❌	Simulation	`decompress_rd[f64, (100000, 0.1)]`	845.9 µs	981.6 µs	-13.82%
⚡	Simulation	`decompress_rd[f64, (100000, 0.0)]`	1,024.6 µs	845.9 µs	+21.12%
⚡	Simulation	`decompress_rd[f32, (100000, 0.0)]`	586.8 µs	499.3 µs	+17.51%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[128]`	244.4 ns	215.3 ns	+13.55%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[1024]`	304.7 ns	275.6 ns	+10.58%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing siddarth2810:feat/sequence-cast-compute (c73544f) with develop (9444d20)}

10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

gatesn

It's a good catch that this was broken before.

But why is it important to store this information? Vs just always computing in i64/u64 space?

siddarth2810 · 2026-06-14T14:11:25Z

It's a good catch that this was broken before.

But why is it important to store this information? Vs just always computing in i64/u64 space?

Thanks for the quick review :)

The maintainer in the previous PR mentioned to use two ptypes, so I went ahead with that design in mind.

But after experimenting a bit on this, I think we could get calculation_ptype from base.ptype()
instead of being passed around or stored separately.

For deserialization, I think we can use scalar kind before decoding the scalar, so we may not need to store calculation_ptype in SequenceMetadata either.

connortsui20 · 2026-06-15T13:51:35Z

    multiplier: PValue,
+    calculation_ptype: PType,


This is a bit outside of what this PR is trying to do, but could you document this now that we are adding a third type here that is different from the base and multiplier? With just base and multiplier it is obvious what this is doing, but with the addition of calculation_ptype this is now harder to understand on a first read. And if possible you could document other places that use this now, that would be great, thanks!

Edit: I am interested to see if your idea about not storing this at all works. That would probably be better for us since that is not a breaking change.

My understanding of this issue was to use two types. caclculation_ptype for computing and output_ptype for the result type. But yeah, adding another type makes it less obvious in the first read, makes sense.

I'l work on the idea of not storing this at all.

Thanks a lot !

joseph-isaacs · 2026-06-15T14:17:44Z

    #[prost(message, tag = "2")]
    multiplier: Option<vortex_proto::scalar::ScalarValue>,
+    #[prost(enumeration = "PType", optional, tag = "3")]
+    calculation_ptype: Option<i32>,


can you explain why in a doc str why we need this, if we need this

I was trying to decode the base and multiplier as calculation_ptype during deserialization, so I added this. But after Gates comments, I found that I could use scalar_value::Kind to get the type instead. I'll remove this in the next change

Thanks for the review :)

siddarth2810 added 2 commits June 13, 2026 22:12

sequence: update callers for explicit output ptype

c73544f

Sequence::try_new now accepts both the calculation ptype and output ptype. Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>

siddarth2810 requested a review from a team June 13, 2026 18:06

connortsui20 self-requested a review June 13, 2026 23:47

connortsui20 added the changelog/fix A bug fix label Jun 13, 2026

gatesn reviewed Jun 14, 2026

View reviewed changes

connortsui20 requested changes Jun 15, 2026

View reviewed changes

joseph-isaacs changed the title ~~Feat/sequence cast compute~~ feat: sequence cast compute Jun 15, 2026

joseph-isaacs reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sequence cast compute#8403

feat: sequence cast compute#8403
siddarth2810 wants to merge 2 commits into
vortex-data:developfrom
siddarth2810:feat/sequence-cast-compute

siddarth2810 commented Jun 13, 2026

Uh oh!

connortsui20 commented Jun 13, 2026

Uh oh!

codspeed-hq Bot commented Jun 13, 2026

Uh oh!

gatesn left a comment

Uh oh!

siddarth2810 commented Jun 14, 2026

Uh oh!

connortsui20 Jun 15, 2026 •

edited

Loading

Uh oh!

siddarth2810 Jun 15, 2026

Uh oh!

joseph-isaacs Jun 15, 2026

Uh oh!

siddarth2810 Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

siddarth2810 commented Jun 13, 2026

Summary

API Changes

Testing

AI tools Disclosure

Uh oh!

connortsui20 commented Jun 13, 2026

Uh oh!

codspeed-hq Bot commented Jun 13, 2026

Merging this PR will not alter performance

Performance Changes

Footnotes

Uh oh!

gatesn left a comment

Choose a reason for hiding this comment

Uh oh!

siddarth2810 commented Jun 14, 2026

Uh oh!

connortsui20 Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siddarth2810 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

siddarth2810 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

connortsui20 Jun 15, 2026 •

edited

Loading