Skip to content

feat(service): Add metrics for multipart upload operations#485

Open
lcian wants to merge 3 commits into
mainfrom
lcian/feat/proper-metrics
Open

feat(service): Add metrics for multipart upload operations#485
lcian wants to merge 3 commits into
mainfrom
lcian/feat/proper-metrics

Conversation

@lcian
Copy link
Copy Markdown
Member

@lcian lcian commented Jun 2, 2026

This adds latency metrics for all multipart operations, all tagged with usecase: multipart.initiate.latency, multipart.upload_part.latency, multipart.list_parts.latency, multipart.abort.latency, and multipart.complete.latency.
Note that these are distributions but we can also chart them as counts/rates to count such requests.
We also track two specific sizes: multipart.upload_part.size and multipart.complete.part_count.

Additionally, we reuse the put.size which we use for normal PUTs to record the size of the assembled object upon multipart_complete.
The reason we reuse that metric is that right now we only use it to create histograms for the size we're storing, so I think it's actually good to have them together.
A new tag upload_type lets us distinguish between the kind of upload that recorded a put.size (direct or multipart).

Close FS-360

@codecov

This comment has been minimized.

@lcian lcian marked this pull request as ready for review June 2, 2026 13:43
@lcian lcian requested a review from a team as a code owner June 2, 2026 13:43
@lcian lcian requested a review from matt-codecov June 2, 2026 13:44
@linear-code
Copy link
Copy Markdown

linear-code Bot commented Jun 2, 2026

FS-360

Comment on lines +839 to +849
usecase = id.usecase().to_owned(),
);
if let Some(size) = metadata.size {
objectstore_metrics::record!(
"put.size" = size as u64,
usecase = id.usecase().to_owned(),
backend_choice = BackendChoice::LongTerm.as_str(),
backend_type = self.backend_type(&BackendChoice::LongTerm),
upload_type = "multipart",
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The complete_multipart function returns early on idempotent success paths, skipping the metrics recording block and causing undercounting.
Severity: MEDIUM

Suggested Fix

To ensure all successful operations are counted, metrics should be recorded before the early return on the idempotency success path (line 745). Consider using a defer-like pattern to ensure metrics are always recorded on all exit paths, or explicitly add metric recording calls before each return statement.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: objectstore-service/src/backend/tiered.rs#L833-L849

Potential issue: The `complete_multipart` function in `tiered.rs` has multiple early
return statements that bypass the metrics recording logic located at lines 833-849.
Specifically, the idempotency check on line 745, which handles successful retries,
returns without recording metrics like `multipart.complete.latency`. This contradicts
the PR's goal of adding metrics for "all multipart operations" and will lead to
significant undercounting of successful operations in production, as client retries are
a normal and expected part of the workflow. Various error paths also bypass the metrics
block.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 737c745. Configure here.

usecase = id.usecase().to_owned(),
);

result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent latency metrics recording on error paths

Medium Severity

upload_part, list_parts, and abort_multipart capture the result and record latency metrics regardless of success or failure, while initiate_multipart uses ? (skipping metrics on error) and complete_multipart has multiple early return paths that bypass metrics entirely. This inconsistency means latency data will include failed operations for some methods but not others, making cross-method comparisons unreliable.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 737c745. Configure here.

objectstore_metrics::record!(
"multipart.upload_part.size" = content_length,
usecase = id.usecase().to_owned(),
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upload part size recorded even on failure

Medium Severity

multipart.upload_part.size records content_length regardless of whether the upload actually succeeded. A failed upload stores no data, so counting its declared size inflates the size distribution. The analogous put.size for direct uploads is only recorded after a successful put (via ?), making this behavior inconsistent.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 737c745. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant