Skip to content

feat(gemini): plumb through cache tokens in metadata events#2287

Open
yatszhash wants to merge 1 commit into
strands-agents:mainfrom
yatszhash:feat/gemini-cache-tokens
Open

feat(gemini): plumb through cache tokens in metadata events#2287
yatszhash wants to merge 1 commit into
strands-agents:mainfrom
yatszhash:feat/gemini-cache-tokens

Conversation

@yatszhash
Copy link
Copy Markdown

@yatszhash yatszhash commented May 13, 2026

Motivation

When the Gemini model returns usage_metadata.cached_content_token_count, GeminiModel currently discards it. Users have no way to see whether their requests are benefiting from Gemini implicit (or explicit) caching, which is valuable for cost optimization and debugging.

The bidi Gemini provider (experimental/bidi/models/gemini_live.py) already plumbs this through; the production GeminiModel is the missing piece. The OpenAI provider received the same treatment in #2116.

Public API Changes

No public API changes. The metadata event emitted by GeminiModel._format_chunk now includes cacheReadInputTokens in the usage data when Gemini reports cached prompt tokens:

# Before: metadata event usage
{"inputTokens": 18625, "outputTokens": 188, "totalTokens": 18813}

# After: metadata event usage (when cache hit occurs)
{"inputTokens": 18625, "outputTokens": 188, "totalTokens": 18813, "cacheReadInputTokens": 18010}

When cached_content_token_count is None or 0, the field is omitted — preserving backward compatibility and matching the convention established by the OpenAI provider (#2116). The existing telemetry pipeline (tracer and metrics) already handles cacheReadInputTokens, so cache data flows through automatically.

Only cacheReadInputTokens is set because Gemini's usage_metadata does not expose a cache write token equivalent (consistent with the OpenAI provider).

Related Issues

Relates to #1060 (Add explicit context caching support for Gemini models) — this PR addresses only the visibility portion of that issue. The explicit cache lifecycle APIs proposed there (enable_caching, cache_ttl, create_cache(), delete_cache()) are a much larger surface and remain open under #1060 for a follow-up.

Also relates to #1140 (Caching support for all models).

Documentation PR

Not required — no public API changes.

Type of Change

New feature

Testing

Unit tests added in tests/strands/models/test_gemini.py:

  • test_format_chunk_metadata_with_cache_tokenscached_content_token_count=25 → metadata exposes cacheReadInputTokens=25.
  • test_format_chunk_metadata_with_zero_cached_tokenscached_content_token_count=0cacheReadInputTokens is omitted.

The "field unset" case (Gemini's default response when no cache) is implicitly covered by every existing test_stream_response_* test that does not set cached_content_token_count.

Manually verified against Vertex AI Gemini 2.5 Flash: a second request with an identical ~18k-token prefix reports cacheReadInputTokens=18010 where the pre-patch code reported no cache field. Accumulation through EventLoopMetrics was also confirmed end-to-end (cumulative cacheReadInputTokens matches the sum of per-call values).

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Surface cached_content_token_count from usage_metadata as
cacheReadInputTokens on the metadata event emitted by GeminiModel.
The existing telemetry pipeline picks it up automatically.

Relates to strands-agents#1060, strands-agents#1140.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant