Skip to content

feat: Initial support for HTTP range requests#481

Open
lcian wants to merge 42 commits into
mainfrom
lcian/feat/range-requests
Open

feat: Initial support for HTTP range requests#481
lcian wants to merge 42 commits into
mainfrom
lcian/feat/range-requests

Conversation

@lcian
Copy link
Copy Markdown
Member

@lcian lcian commented May 26, 2026

Adds standard HTTP range requests (RFC) support for single-object GET operations.

Spec vs implementation:

  • This implementation only supports byte ranges. The spec doesn't specifically talk about bytes, so in theory it admits ranges of other "types", which just don't make sense for us.
  • This implementation only supports Single part ranges, meaning stuff like Range: bytes=0-1023. The spec allows for Multipart ranges too, i.e. fetching multiple ranges in a single request, which we don't support. We can add it in the future if needed.
  • We should probably implement Conditional range requests in a follow-up PR, as they're pretty much essential to guarantee that clients read consistent data (if the object is swapped between range reads, we would otherwise serve chunks from 2 potentially different objects). Note that Symbolicator doesn't use these, so we should also update Symbolicator and any other user that uses range reads to actually use these to prevent random failures/data inconsistency. (maybe in practice this is fine as data that Symbolicator fetches cannot really change due to the way it's written, I don't know yet)
    Looking for feedback on whether we think we need this now or not.

Implementation details:

  • All get_object paths now take Option<ByteRange> and return ContentRange alongside the payload. This seems to be the way to implement this with the least code duplication. Otherwise we would have to introduce a get_object_range, implement it for every backend and wrappers of the backends, etc.
  • Bigtable: we know that the objects there are small, so we just return the full payload. This behavior is compliant with the spec.
  • GCS/S3: objectstore-service reserializes the Range header and forwards it to the backend. On the response path, we proxy back the content-range provided by the backend. TODO:
  • Not supported for batch operations.
  • Not implemented in clients for now because Symbolicator (which would be the only user atm) doesn't use our client. We can easily implement it when needed.

Close FS-363

@codecov

This comment was marked as resolved.

@lcian

This comment was marked as spam.

@lcian

This comment was marked as spam.

cursor[bot]

This comment was marked as outdated.

@linear-code

This comment was marked as spam.

@lcian lcian changed the title feat: Support HTTP Range requests for single-object GETs feat: Range requests May 27, 2026
@lcian lcian changed the title feat: Range requests feat(server+service): Support HTTP range requests May 27, 2026
lcian added 9 commits May 29, 2026 10:38
Thread `Option<ByteRange>` through all `get_object` paths and return
`ContentRange` in every response so the HTTP handler can choose between
200, 206, and 416 with correct headers.

Backends that store objects remotely (GCS, S3) forward the Range header
and parse the response. Local-filesystem uses seek-based partial reads.
In-memory and Bigtable ignore the range and always return the full
payload with 200, which is RFC-compliant.
- Parse Range header unit case-insensitively per RFC 9110
  (e.g. `Bytes=0-4` now works instead of being ignored)
- Multi-range requests fall back to full 200 response instead
  of returning 400, matching RFC semantics for unsupported features
- Only set explicit Content-Length when content_range.total > 0, avoiding
  a Content-Length: 0 regression when metadata.size is unknown.
- Use response Content-Length (not metadata.size) in 206 fallback paths
  for GCS and S3 so Content-Length matches the body if Content-Range is
  missing from a 206 response.
Reduce test boilerplate by consolidating 20 unit tests into 7 with
identical coverage. Remove redundant integration test already covered
by unit tests. Use HeaderValue::from_static, delegate to Display impl,
and pass ByteRange directly instead of an intermediate HeaderMap.
Tests added to main after this branch diverged still used the old
get_object signature (1 arg, 2-tuple return). Update them to pass
None for the range parameter and destructure the 3-tuple.
@lcian lcian force-pushed the lcian/feat/range-requests branch from 85de81c to d12a378 Compare May 29, 2026 08:41
lcian and others added 4 commits May 29, 2026 13:37
…r S3 partial size

bytes=-0 (last 0 bytes) is semantically meaningless and should be
rejected with RangeError::Invalid rather than parsed as Last(0).

For S3 206 responses, Content-Length reflects the partial body length,
not the full object size. Read metadata.size from the Content-Range
total field instead so the metadata accurately represents the full
object.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lcian lcian changed the title feat(server+service): Support HTTP range requests feat: Support HTTP range requests May 29, 2026
@lcian lcian changed the title feat: Support HTTP range requests feat: Basic support for HTTP range requests May 29, 2026
@lcian lcian changed the title feat: Basic support for HTTP range requests feat: Initial support for HTTP range requests May 29, 2026
@lcian lcian force-pushed the lcian/feat/range-requests branch from 5e153aa to 61e7cec Compare June 1, 2026 14:07
@lcian lcian marked this pull request as ready for review June 1, 2026 14:24
@lcian lcian requested a review from a team as a code owner June 1, 2026 14:24
Comment thread objectstore-service/src/backend/s3_compatible.rs Outdated
Comment thread objectstore-service/src/backend/s3_compatible.rs Outdated
Comment thread objectstore-service/src/backend/gcs.rs
Comment thread objectstore-service/src/backend/bigtable.rs
@lcian lcian force-pushed the lcian/feat/range-requests branch 2 times, most recently from 93e1429 to 8d40bb4 Compare June 1, 2026 14:50
Comment thread objectstore-service/src/backend/local_fs.rs Outdated
cursor[bot]

This comment was marked as outdated.

match total {
Some(total) => return Err(Error::RangeNotSatisfiable { total }),
None => {
return Err(Error::Generic {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the GCS emulator was buggy and not returning the header for 406 responses.
Fixed it here getsentry/google-cloud-storage-testbench#3 and upstreaming it here googleapis/storage-testbench#802.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The S3 backend doesn't really work right now, we need #442, that's why I didn't add tests.

Comment thread objectstore-service/src/backend/tiered.rs Outdated
@lcian lcian force-pushed the lcian/feat/range-requests branch from 2b78ff4 to 0e21c59 Compare June 2, 2026 11:03
cursor[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0aa575c. Configure here.

Comment thread objectstore-service/src/backend/in_memory.rs
Copy link
Copy Markdown
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that i'm seeing it, i don't love how inlining range reads in our regular GET operations looks, but i also don't love duplicating all the GET code and just handling this one header and two response codes differently.

also, this is elaborated more in another comment, but it would be nice to add Accept-Ranges: bytes headers only for objects that live in backends that support ranges. but i don't have a clear idea of how to do that cleanly today

i thought about actionable feedback for about an hour and didn't come up with anything so i'm accepting to unblock, but please give them some thought on your own. re-request review if you think you have good solutions

Comment on lines +77 to +80
// We are free to decide whether we want to fall back to the whole object or error.
// Per RFC 9110:
// > A server that supports range requests MAY ignore or reject a Range header
// field that contains an invalid ranges-specifier [...]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me, this RFC excerpt sounds a bit narrower in what it actually allows. if we advertise support for range reads and a client sends us a well-formed range specifier, it doesn't permit us to decide not to honor that range. if we want full spec compliance, i think we have to impl multi-range support and also build range reads ourselves atop backends that don't support it natively

a workable middle ground would be to only add Accept-Ranges: bytes to responses for objects stored in backends that support it. clients can HEAD req a specific object to learn in advance whether range reads are supported for it. any scenario i can think of can do what it needs to do that way:

  • a client that wants to download a huge file can check whether to download in parallel chunks (if range-reads are supported) or in a single request (otherwise)
  • a client that wants to resume an interrupted download can check whether to append (if range-reads are supported) or replace (otherwise) the incomplete download with the retry response.
  • a client that wants to download a specific file from within an archive (based on some out-of-band index query) can blindly attempt a range-read and just deal with it locally if the range wasn't respected
    • if they want multiple files within the archive, they may want to check with HEAD req and DL in parallel, or maybe they will try multi-range and can similarly just deal with it locally if the range wasn't respected. idk

unfortunately i'm not sure if there is a great way to do that. it's breaking abstraction to give the service impl a way to manipulate HTTP headers. an ugly kludge would be to add a range_read bool to Metadata or to return it on the side

if we did the backend refactor idea i wrote about, it might be cleaner. StorageService would absorb TieredBackend and could expose a BackendDescriptor or something that can carry this information. idk

Xt(id): Xt<ObjectId>,
headers: HeaderMap,
) -> ApiResult<Response> {
let byte_range = match headers.get(http::header::RANGE) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this all be an extractor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants