refactor(openai): Split token counting by API for easier deprecation#5930
refactor(openai): Split token counting by API for easier deprecation#5930ericapisani merged 10 commits intomasterfrom
Conversation
…Responses API functions Replace the shared `_calculate_token_usage()` and `_get_usage()` with two API-specific functions: `_calculate_completions_token_usage()` and `_calculate_responses_token_usage()`. This makes it clear which token fields belong to which API and enables clean removal of Chat Completions support when it is deprecated. - Completions function extracts `prompt_tokens`, `completion_tokens`, `total_tokens` and supports `streaming_message_token_usage` for stream_options include_usage - Responses function extracts `input_tokens`, `output_tokens`, `total_tokens` plus `cached_tokens` and `reasoning_tokens` details - Add API section comments in `_set_common_output_data` - Update all call sites to use the appropriate API-specific function - Convert Completions call sites to use keyword arguments - Update and rename unit tests; add Responses API token usage tests - Add sync and async streaming tests for usage-in-stream Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n_usage call sites Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Semver Impact of This PR🟢 Patch (bug fixes) 📋 Changelog PreviewThis is how your changes will appear in the changelog. New Features ✨
Internal Changes 🔧
🤖 This preview updates automatically when you update the PR. |
|
bugbot run |
Codecov Results 📊✅ 2084 passed | ⏭️ 170 skipped | Total: 2254 | Pass Rate: 92.46% | Execution Time: 3m 8s 📊 Comparison with Base Branch
All tests are passing successfully. ❌ Patch coverage is 0.00%. Project has 12241 uncovered lines. Files with missing lines (1)
Coverage diff@@ Coverage Diff @@
## main #PR +/-##
==========================================
+ Coverage 40.47% 41.78% +1.31%
==========================================
Files 190 190 —
Lines 20963 21027 +64
Branches 7196 7345 +149
==========================================
+ Hits 8484 8786 +302
- Misses 12479 12241 -238
- Partials 434 475 +41Generated by Codecov Action |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Response usage silently overrides streaming token usage
- Changed the second if statement to elif to ensure streaming_message_token_usage takes precedence over response.usage, preventing silent data loss.
Or push these changes by commenting:
@cursor push 6e65058f68
Preview (6e65058f68)
diff --git a/sentry_sdk/integrations/openai.py b/sentry_sdk/integrations/openai.py
--- a/sentry_sdk/integrations/openai.py
+++ b/sentry_sdk/integrations/openai.py
@@ -164,8 +164,7 @@
if streaming_message_token_usage:
usage = streaming_message_token_usage
-
- if hasattr(response, "usage"):
+ elif hasattr(response, "usage"):
usage = response.usage
if usage is not None:This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
… usage The refactor that split _calculate_token_usage into separate Completions and Responses functions dropped extraction of prompt_tokens_details.cached_tokens and completion_tokens_details.reasoning_tokens from the Completions path. This restores those fields so spans for cached prompts and reasoning models (e.g. o1/o3) report complete token usage metrics. Also fixes streaming usage priority: streaming_message_token_usage now correctly takes precedence over response.usage via elif. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test used MagicMock(message="one") where message was a plain string, but the real OpenAI API returns Choice objects with message.content. The counting code checks hasattr(choice.message, "content"), which failed on strings, so manual token counting was never exercised. Use real Choice and ChatCompletionMessage objects and fix the expected output_tokens. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
bugbot run |
The stream_options parameter was not available in early versions of the OpenAI Python SDK, causing TypeError on v1.0.1 CI runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e_total_token_usage Clarify that this variable holds the total token usage from streaming responses. Also fix the None check to use `is not None` instead of truthiness, preventing false negatives when usage has zero-valued fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace inline hasattr+isinstance patterns in both _calculate_completions_token_usage and _calculate_responses_token_usage with a shared helper that uses getattr safely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aming responses Move _calculate_completions_token_usage outside the data_buf check so token usage from stream metadata is recorded even when no content chunks are produced (e.g. content filter). Also count output tokens from response.output when streaming_message_responses is absent in the Responses API path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
alexander-alderman-webb
left a comment
There was a problem hiding this comment.
Super nice!
Pretty much all as expected and now handling more edge cases while also having simpler code in the end 🙏 .
See my comment regarding two new tests that rely on private functions. After that I can stamp.

_calculate_token_usage()and_get_usage()with two API-specific functions:_calculate_completions_token_usage()(Chat Completions / Embeddings) and_calculate_responses_token_usage()(Responses API)["input_tokens", "prompt_tokens"]streaming_message_token_usagefromstream_options={"include_usage": True}in streaming Completions_set_common_output_datato clarify which branch handles which APIMotivation
When Chat Completions is deprecated, removing it should be a simple delete operation without auditing shared code. Before this change,
_calculate_token_usagehandled both APIs with interleaved logic, making it unclear what was safe to remove.We also needed to ensure that, when handling streamed responses from the chat completions API, the
usagein the final chunk of the message correctly made its way to the_calculate_*_token_usagemethod.Other design decisions to note
Some duplication of code between the completions api and responses api token counting
This is to allow for easier removal of the completions api logic once it's fully deprecated/removed from the OpenAI API.