feat: Initial support for HTTP range requests#481
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
Thread `Option<ByteRange>` through all `get_object` paths and return `ContentRange` in every response so the HTTP handler can choose between 200, 206, and 416 with correct headers. Backends that store objects remotely (GCS, S3) forward the Range header and parse the response. Local-filesystem uses seek-based partial reads. In-memory and Bigtable ignore the range and always return the full payload with 200, which is RFC-compliant.
- Parse Range header unit case-insensitively per RFC 9110 (e.g. `Bytes=0-4` now works instead of being ignored) - Multi-range requests fall back to full 200 response instead of returning 400, matching RFC semantics for unsupported features
- Only set explicit Content-Length when content_range.total > 0, avoiding a Content-Length: 0 regression when metadata.size is unknown. - Use response Content-Length (not metadata.size) in 206 fallback paths for GCS and S3 so Content-Length matches the body if Content-Range is missing from a 206 response.
Reduce test boilerplate by consolidating 20 unit tests into 7 with identical coverage. Remove redundant integration test already covered by unit tests. Use HeaderValue::from_static, delegate to Display impl, and pass ByteRange directly instead of an intermediate HeaderMap.
Tests added to main after this branch diverged still used the old get_object signature (1 arg, 2-tuple return). Update them to pass None for the range parameter and destructure the 3-tuple.
85de81c to
d12a378
Compare
…r S3 partial size bytes=-0 (last 0 bytes) is semantically meaningless and should be rejected with RangeError::Invalid rather than parsed as Last(0). For S3 206 responses, Content-Length reflects the partial body length, not the full object size. Read metadata.size from the Content-Range total field instead so the metadata accurately represents the full object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5e153aa to
61e7cec
Compare
93e1429 to
8d40bb4
Compare
| match total { | ||
| Some(total) => return Err(Error::RangeNotSatisfiable { total }), | ||
| None => { | ||
| return Err(Error::Generic { |
There was a problem hiding this comment.
Seems like the GCS emulator was buggy and not returning the header for 406 responses.
Fixed it here getsentry/google-cloud-storage-testbench#3 and upstreaming it here googleapis/storage-testbench#802.
There was a problem hiding this comment.
The S3 backend doesn't really work right now, we need #442, that's why I didn't add tests.
2b78ff4 to
0e21c59
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0aa575c. Configure here.
matt-codecov
left a comment
There was a problem hiding this comment.
now that i'm seeing it, i don't love how inlining range reads in our regular GET operations looks, but i also don't love duplicating all the GET code and just handling this one header and two response codes differently.
also, this is elaborated more in another comment, but it would be nice to add Accept-Ranges: bytes headers only for objects that live in backends that support ranges. but i don't have a clear idea of how to do that cleanly today
i thought about actionable feedback for about an hour and didn't come up with anything so i'm accepting to unblock, but please give them some thought on your own. re-request review if you think you have good solutions
| // We are free to decide whether we want to fall back to the whole object or error. | ||
| // Per RFC 9110: | ||
| // > A server that supports range requests MAY ignore or reject a Range header | ||
| // field that contains an invalid ranges-specifier [...] |
There was a problem hiding this comment.
to me, this RFC excerpt sounds a bit narrower in what it actually allows. if we advertise support for range reads and a client sends us a well-formed range specifier, it doesn't permit us to decide not to honor that range. if we want full spec compliance, i think we have to impl multi-range support and also build range reads ourselves atop backends that don't support it natively
a workable middle ground would be to only add Accept-Ranges: bytes to responses for objects stored in backends that support it. clients can HEAD req a specific object to learn in advance whether range reads are supported for it. any scenario i can think of can do what it needs to do that way:
- a client that wants to download a huge file can check whether to download in parallel chunks (if range-reads are supported) or in a single request (otherwise)
- a client that wants to resume an interrupted download can check whether to append (if range-reads are supported) or replace (otherwise) the incomplete download with the retry response.
- a client that wants to download a specific file from within an archive (based on some out-of-band index query) can blindly attempt a range-read and just deal with it locally if the range wasn't respected
- if they want multiple files within the archive, they may want to check with
HEADreq and DL in parallel, or maybe they will try multi-range and can similarly just deal with it locally if the range wasn't respected. idk
- if they want multiple files within the archive, they may want to check with
unfortunately i'm not sure if there is a great way to do that. it's breaking abstraction to give the service impl a way to manipulate HTTP headers. an ugly kludge would be to add a range_read bool to Metadata or to return it on the side
if we did the backend refactor idea i wrote about, it might be cleaner. StorageService would absorb TieredBackend and could expose a BackendDescriptor or something that can carry this information. idk
| Xt(id): Xt<ObjectId>, | ||
| headers: HeaderMap, | ||
| ) -> ApiResult<Response> { | ||
| let byte_range = match headers.get(http::header::RANGE) { |
There was a problem hiding this comment.
could this all be an extractor?

Adds standard HTTP range requests (RFC) support for single-object GET operations.
Spec vs implementation:
Range: bytes=0-1023. The spec allows for Multipart ranges too, i.e. fetching multiple ranges in a single request, which we don't support. We can add it in the future if needed.Looking for feedback on whether we think we need this now or not.
Implementation details:
get_objectpaths now takeOption<ByteRange>and returnContentRangealongside the payload. This seems to be the way to implement this with the least code duplication. Otherwise we would have to introduce aget_object_range, implement it for every backend and wrappers of the backends, etc.objectstore-servicereserializes theRangeheader and forwards it to the backend. On the response path, we proxy back thecontent-rangeprovided by the backend. TODO:Close FS-363