feat(service): Add metrics for multipart upload operations#485
Conversation
This comment has been minimized.
This comment has been minimized.
| usecase = id.usecase().to_owned(), | ||
| ); | ||
| if let Some(size) = metadata.size { | ||
| objectstore_metrics::record!( | ||
| "put.size" = size as u64, | ||
| usecase = id.usecase().to_owned(), | ||
| backend_choice = BackendChoice::LongTerm.as_str(), | ||
| backend_type = self.backend_type(&BackendChoice::LongTerm), | ||
| upload_type = "multipart", | ||
| ); | ||
| } |
There was a problem hiding this comment.
Bug: The complete_multipart function returns early on idempotent success paths, skipping the metrics recording block and causing undercounting.
Severity: MEDIUM
Suggested Fix
To ensure all successful operations are counted, metrics should be recorded before the early return on the idempotency success path (line 745). Consider using a defer-like pattern to ensure metrics are always recorded on all exit paths, or explicitly add metric recording calls before each return statement.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: objectstore-service/src/backend/tiered.rs#L833-L849
Potential issue: The `complete_multipart` function in `tiered.rs` has multiple early
return statements that bypass the metrics recording logic located at lines 833-849.
Specifically, the idempotency check on line 745, which handles successful retries,
returns without recording metrics like `multipart.complete.latency`. This contradicts
the PR's goal of adding metrics for "all multipart operations" and will lead to
significant undercounting of successful operations in production, as client retries are
a normal and expected part of the workflow. Various error paths also bypass the metrics
block.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 737c745. Configure here.
| usecase = id.usecase().to_owned(), | ||
| ); | ||
|
|
||
| result |
There was a problem hiding this comment.
Inconsistent latency metrics recording on error paths
Medium Severity
upload_part, list_parts, and abort_multipart capture the result and record latency metrics regardless of success or failure, while initiate_multipart uses ? (skipping metrics on error) and complete_multipart has multiple early return paths that bypass metrics entirely. This inconsistency means latency data will include failed operations for some methods but not others, making cross-method comparisons unreliable.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 737c745. Configure here.
| objectstore_metrics::record!( | ||
| "multipart.upload_part.size" = content_length, | ||
| usecase = id.usecase().to_owned(), | ||
| ); |
There was a problem hiding this comment.
Upload part size recorded even on failure
Medium Severity
multipart.upload_part.size records content_length regardless of whether the upload actually succeeded. A failed upload stores no data, so counting its declared size inflates the size distribution. The analogous put.size for direct uploads is only recorded after a successful put (via ?), making this behavior inconsistent.
Reviewed by Cursor Bugbot for commit 737c745. Configure here.


This adds latency metrics for all multipart operations, all tagged with
usecase:multipart.initiate.latency,multipart.upload_part.latency,multipart.list_parts.latency,multipart.abort.latency, andmultipart.complete.latency.Note that these are distributions but we can also chart them as counts/rates to count such requests.
We also track two specific sizes:
multipart.upload_part.sizeandmultipart.complete.part_count.Additionally, we reuse the
put.sizewhich we use for normal PUTs to record the size of the assembled object uponmultipart_complete.The reason we reuse that metric is that right now we only use it to create histograms for the size we're storing, so I think it's actually good to have them together.
A new tag
upload_typelets us distinguish between the kind of upload that recorded aput.size(directormultipart).Close FS-360