[FEATURE] Add explicit context caching support for Gemini models

### Problem Statement

The current `GeminiModel` implementation does not support Gemini's explicit context caching feature, which provides up to 90% cost reduction on cached tokens. While Gemini 2.5 models have implicit caching, it doesn't work reliably with Strands' request structure (system prompt and tools in `config` instead of `contents`).

**Current behavior:**
- Every request sends full system prompt + tools (e.g., 13,494 tokens)
- No visibility into cached tokens
- No control over cache lifecycle
- `cached_content_token_count` always returns `None`

**Expected behavior:**
- Ability to explicitly cache system prompt + tools
- 75-90% discount on cached tokens
- Cache visibility via `usage_metadata.cached_content_token_count`
- Cache lifecycle management (create, delete, TTL)


### Proposed Solution

Add explicit context caching support to `GeminiModel` similar to how `BedrockModel` implements `cache_prompt` parameter.

**API design:**

```python
from strands.models.gemini import GeminiModel

model = GeminiModel(
    model_id="gemini-2.5-flash",
    client_args={"api_key": "..."},
    enable_caching=True,    # Enable auto-caching
    cache_ttl="3600s"       # Cache TTL (default 1 hour)
)

# Or manual cache management
model.create_cache(system_prompt, tool_specs, ttl="7200s")
model.delete_cache()
```

**Key features:**
1. **Auto-cache creation**: Automatically creates cache on first request when `enable_caching=True`
2. **Cache validation**: Reuses cache when system prompt + tools match
3. **Visibility**: Exposes `cachedTokens` in `metadata.usage`
4. **Cache lifecycle**: Methods for create/delete/manage cache

## Implementation Details

Changes needed in `strands/models/gemini.py`:

1. Add `enable_caching` and `cache_ttl` to `GeminiConfig`
2. Add `create_cache()` and `delete_cache()` methods
3. Modify `_format_request_config()` to accept `cached_content` parameter
4. Add cache validation logic in `_format_request()`
5. Expose `cached_content_token_count` in metadata

## References

- [[Gemini Context Caching Docs](https://ai.google.dev/gemini-api/docs/caching)](https://ai.google.dev/gemini-api/docs/caching)
- [[Python SDK Reference](https://googleapis.github.io/python-genai/)](https://googleapis.github.io/python-genai/)
- [[Bedrock Cache Implementation](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/amazon-bedrock/)](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/amazon-bedrock/) (for API consistency)

## Alternative Solutions

1. **Do nothing**: Users pay 5-10x more in token costs
2. **Rely on implicit caching**: Unreliable, no visibility, no control

## Additional Context

Tested implementation shows:
- 68% token reduction on real workload
- `cached_content_token_count: 9,255` out of 13,564 total tokens
- Works with 30+ tools and complex system prompts
- Compatible with existing Strands agent loop

I'm happy to submit a PR with the implementation if this feature request is accepted.

### Use Case

Agents with large system prompts or many tools (e.g., 30 tools = ~9K tokens) incur high costs on every request. For production workloads with 1,000+ messages/day, this becomes expensive quickly.

**Example cost impact:**
- Without caching: 13,564 tokens/msg × 30K msgs/month × $0.00000035 = **$142/month**
- With caching: ~4,300 effective tokens/msg × 30K msgs/month = **$69/month**
- **Savings: $73/month ($876/year)**

### Alternatives Solutions

_No response_

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add explicit context caching support for Gemini models #1060

Problem Statement

Proposed Solution

Implementation Details

References

Alternative Solutions

Additional Context

Use Case

Alternatives Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Add explicit context caching support for Gemini models #1060

Description

Problem Statement

Proposed Solution

Implementation Details

References

Alternative Solutions

Additional Context

Use Case

Alternatives Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions