feat(gemini): plumb through cache tokens in metadata events#2287
Open
yatszhash wants to merge 1 commit into
Open
feat(gemini): plumb through cache tokens in metadata events#2287yatszhash wants to merge 1 commit into
yatszhash wants to merge 1 commit into
Conversation
Surface cached_content_token_count from usage_metadata as cacheReadInputTokens on the metadata event emitted by GeminiModel. The existing telemetry pipeline picks it up automatically. Relates to strands-agents#1060, strands-agents#1140.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
When the Gemini model returns
usage_metadata.cached_content_token_count,GeminiModelcurrently discards it. Users have no way to see whether their requests are benefiting from Gemini implicit (or explicit) caching, which is valuable for cost optimization and debugging.The bidi Gemini provider (
experimental/bidi/models/gemini_live.py) already plumbs this through; the productionGeminiModelis the missing piece. The OpenAI provider received the same treatment in #2116.Public API Changes
No public API changes. The
metadataevent emitted byGeminiModel._format_chunknow includescacheReadInputTokensin the usage data when Gemini reports cached prompt tokens:When
cached_content_token_countisNoneor0, the field is omitted — preserving backward compatibility and matching the convention established by the OpenAI provider (#2116). The existing telemetry pipeline (tracer and metrics) already handlescacheReadInputTokens, so cache data flows through automatically.Only
cacheReadInputTokensis set because Gemini'susage_metadatadoes not expose a cache write token equivalent (consistent with the OpenAI provider).Related Issues
Relates to #1060 (Add explicit context caching support for Gemini models) — this PR addresses only the visibility portion of that issue. The explicit cache lifecycle APIs proposed there (
enable_caching,cache_ttl,create_cache(),delete_cache()) are a much larger surface and remain open under #1060 for a follow-up.Also relates to #1140 (Caching support for all models).
Documentation PR
Not required — no public API changes.
Type of Change
New feature
Testing
Unit tests added in
tests/strands/models/test_gemini.py:test_format_chunk_metadata_with_cache_tokens—cached_content_token_count=25→ metadata exposescacheReadInputTokens=25.test_format_chunk_metadata_with_zero_cached_tokens—cached_content_token_count=0→cacheReadInputTokensis omitted.The "field unset" case (Gemini's default response when no cache) is implicitly covered by every existing
test_stream_response_*test that does not setcached_content_token_count.Manually verified against Vertex AI Gemini 2.5 Flash: a second request with an identical ~18k-token prefix reports
cacheReadInputTokens=18010where the pre-patch code reported no cache field. Accumulation throughEventLoopMetricswas also confirmed end-to-end (cumulativecacheReadInputTokensmatches the sum of per-call values).hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.