Skip to content

Bedrock Invoke class for invoking models without Converse API (e.g. Bedrock Custom Model Import)#1217

Open
dgallitelli wants to merge 1 commit into
strands-agents:mainfrom
dgallitelli:main
Open

Bedrock Invoke class for invoking models without Converse API (e.g. Bedrock Custom Model Import)#1217
dgallitelli wants to merge 1 commit into
strands-agents:mainfrom
dgallitelli:main

Conversation

@dgallitelli
Copy link
Copy Markdown
Contributor

@dgallitelli dgallitelli commented Nov 19, 2025

Description

This PR adds a new BedrockModelInvoke class that uses AWS Bedrock's native InvokeModel and InvokeModelWithResponseStream APIs instead of the Converse/ConverseStream APIs used by the existing BedrockModel class. This is particularly ideal to introduce support for Bedrock Custom Model Import.

Key Features:

  • Broader Model Support: Works with Bedrock models that don't support Converse APIs, including imported models
  • Multiple Format Support: Handles both Anthropic Messages API format and OpenAI ChatCompletion format
  • Streaming Support: Processes streaming responses from InvokeModelWithResponseStream
  • Tool Integration: Converts Strands tool specs to model-specific formats (OpenAI functions, Anthropic tools)
  • Error Handling: Maps Bedrock-specific errors to appropriate Strands exceptions

Use Case:
This implementation is particularly useful for imported models (via Bedrock's Custom Model Import feature) that may only support native InvokeModel APIs and expect specific request formats like OpenAI ChatCompletion.

Related Issues

N/A

Documentation PR

N/A

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@dgallitelli
Copy link
Copy Markdown
Contributor Author

Note: hatch run prepare gives error: T-strings are only supported in Python 3.14 and greater [syntax]. However in my code I don't use T-strings. Please verify if someone else introduced this dependency.

@dgallitelli dgallitelli changed the title Bedrock Invoke class for invoking models without Converse API Bedrock Invoke class for invoking models without Converse API (e.g. Bedrock Custom Model Import) Nov 19, 2025
@westonbrown
Copy link
Copy Markdown
Contributor

Do we have any update on when this PR will be merged?

This will unlock the ability to leverage the new opus 4.5 features like tool search, lazy loading of tools, etc. in Bedrock.

Currently it's not possible since only supported via invoke api.

@sirianni
Copy link
Copy Markdown

+1 any update from the Strands team on this?

@dgallitelli
Copy link
Copy Markdown
Contributor Author

Updates since original submission

Rebased onto current main and fixed several issues found during review. Here's a summary:

Rebase / merge conflict (src/strands/models/__init__.py)

main refactored the models package to use lazy loading via __getattr__ and added CacheConfig. The conflict was resolved by keeping BedrockModelInvoke as an eager import alongside BedrockModel (both depend only on boto3, which is already a required dependency), while preserving all lazy-loaded providers and CacheConfig from main.

Bug fixes (src/strands/models/bedrock_invoke.py)

1. _format_request always used OpenAI format (functional bug)
The method called _format_openai_request unconditionally despite _get_model_family() existing. Native Anthropic model IDs (e.g. anthropic.claude-*) would have received an OpenAI-formatted request body, causing API errors. Fixed to route to _format_anthropic_request when the model family is "anthropic".

2. _extract_usage_from_response had an unreachable branch (functional bug)
The OpenAI-format usage block (prompt_tokens / completion_tokens) was under elif "usage" in response_body — the same key as the if above it, so it was never reachable. OpenAI-format token counts were silently dropped. Fixed by branching on the field names inside usage instead (input_tokens vs prompt_tokens).

3. _stream variable shadowed the streaming module import
streaming = self.config.get("streaming", True) shadowed the from ..event_loop import streaming used later in structured_output. Renamed to use_streaming.

4. Image bytes not base64-encoded (functional bug)
_format_anthropic_request was passing raw bytes from image_data["source"]["bytes"] directly as the data field. The Anthropic InvokeModel API requires a base64-encoded string. Fixed to base64.b64encode(...).decode("utf-8").

mypy fixes

hatch run prepare (format + lint + mypy + pytest) now passes cleanly across Python 3.10–3.14 with 2386 tests passing.

  • Added # type: ignore[override] on update_config (same pattern as BedrockModel)
  • Annotated request: dict[str, Any] and content: list[dict[str, Any]] to allow mixed-value nested structures
  • Annotated events: list[StreamEvent] in _parse_anthropic_response
  • Fixed implicit Optional on response_body parameter
  • Added explicit str() cast in _extract_text_from_response

@dgallitelli
Copy link
Copy Markdown
Contributor Author

Pushed a major rework of this PR (force-push, since the head ref is main on the fork). Single squashed commit: feat(bedrock): add BedrockModelInvoke for InvokeModel-only models.

What changed since the last review state

Rebased onto current main (the previous head was 91 commits behind / 4 ahead of merge-base; merge state was DIRTY).

Aligned with current SDK contracts:

  • BedrockInvokeConfig now subclasses BaseModelConfig so context_window_limit flows through (feat: add context_window_limit to model configs #2176).
  • Default model_id bumped to global.anthropic.claude-sonnet-4-6 (imported from bedrock.py) instead of the dated 3.5-sonnet-20241022 id (fix(bedrock): upgrade default model to Claude Sonnet 4.5 #2193).
  • __init__.py switched from eager to lazy __getattr__ import to match the lazy-load pattern used by every other provider.
  • Reuses BEDROCK_CONTEXT_WINDOW_OVERFLOW_MESSAGES, DEFAULT_BEDROCK_MODEL_ID, DEFAULT_BEDROCK_REGION, DEFAULT_READ_TIMEOUT from bedrock.py instead of redefining.

Streaming protocol (this is the bulk of the diff):

  • Anthropic Messages stream is fully translated: message_startmessageStart, content_block_start (text + tool_use) → contentBlockStart, content_block_delta (text and input_json_delta.partial_json) → contentBlockDelta, content_block_stopcontentBlockStop, message_delta.stop_reason/usagemessageStop + metadata.
  • OpenAI Chat Completions stream is translated symmetrically, including delta.tool_calls[] keyed by index (lazy-emit contentBlockStart once id/name is observed; concatenate arguments deltas as toolUse.input).
  • Stop reasons are mapped from each backend's vocabulary to the Strands set (tool_use, max_tokens, stop_sequence, end_turn) instead of being hard-coded to end_turn. This mirrors the fix from fix: override end_turn stop reason when streaming response contains toolUse blocks #1827 — without it, structured_output() and tool execution were unreachable on the streaming path.
  • Non-streaming responses go through the same translation, so tool use works in both modes.

Tool use & ToolChoice:

  • ToolChoice is now translated and forwarded to both backends ({"type": "auto"|"any"|"tool"} for Anthropic, "auto"/"required"/{"type": "function", ...} for OpenAI). Previously the parameter was accepted and silently dropped, which made structured_output non-functional.
  • OpenAI request formatter now passes toolUse and toolResult blocks through (tool_calls on assistant turns, separate role: "tool" messages for results) instead of flattening multi-block messages to a single string.

Other correctness fixes:

  • Image media type now produced as image/<format> (Anthropic API requires the prefixed form).
  • Unknown model ids default to the OpenAI/imported-model format rather than Anthropic — the headline use case for this provider per the description. The behavior can be overridden with the new model_family config key.
  • _extract_text_from_response no longer falls back to json.dumps(response) (could leak full response bodies into assistant text).
  • tool_result.is_error set when status == "error" so error semantics survive the round trip.
  • Dropped the unused guardrail_id/guardrail_version config keys; they were declared but never wired into the Bedrock call.

Tests:

  • 24 unit tests (up from 8), now covering: init/default model id, family detection (parametrised) and override, request formatting for both families with image/tool_use/toolResult/tool_choice, full streaming traces for Anthropic text, Anthropic tool use, OpenAI text + tool use (with usage), non-streaming Anthropic, throttling and context-window-overflow error mapping, and end-to-end structured_output returning a Pydantic model.
  • Integration tests converted from @pytest.mark.skip to runnable. The imported-model test is now gated on STRANDS_BEDROCK_INVOKE_IMPORTED_MODEL_ARN since ARNs are account-specific.

Local verification

hatch run test-lint          → ruff check ✓ / mypy ✓
hatch test tests/strands     → 2974 passed
hatch test tests/strands/models/test_bedrock_invoke.py → 24 passed

About the still-failing checks

The three failing checks (Auto Strands Review / Trigger Strands Review, Secure Integration test / check-access-and-checkout, Secure Integration test / upload-metrics) are all gated on the pull_request_target protected-environment approval flow — they hit GitHub's 30-day waiting-job timeout (720h0m0s) without a maintainer clicking Approve, and the upload-metrics job then fails because the artifact from the unrun integ-tests step doesn't exist (if: always()). Could one of you kick those off?

cc @pgrayy @Unshure @zastrowm — would appreciate another look. Happy to split this into smaller commits if that's easier to review.

Adds a new ``BedrockModelInvoke`` provider that talks to Bedrock's native
``InvokeModel``/``InvokeModelWithResponseStream`` APIs instead of
``Converse``/``ConverseStream``. This makes Strands usable with models that
do not support Converse, most notably Bedrock Custom Model Import (Llama,
Mistral, Qwen, ...) and Anthropic models accessed via the Messages API.

Supports both Anthropic Messages and OpenAI Chat Completions request/response
formats; the wire format is auto-detected from the model id and can be
overridden via the ``model_family`` config key.

Streaming is fully wired through the Bedrock Converse-shaped event contract:
``messageStart``, ``contentBlockStart``/``contentBlockDelta``/``contentBlockStop``
for both text and tool-use blocks, ``messageStop`` with the mapped stop
reason, and a ``metadata`` event carrying token usage. Non-streaming responses
go through the same translation. Tool use, structured output, image inputs
(Anthropic family), tool results, ``ToolChoice``, and the standard Bedrock
error paths (throttling, context-window overflow, access-denied) are covered.

The provider is exposed via lazy ``__getattr__`` import to keep package
import time unchanged.

Tests: 24 unit tests covering init, family detection, request formatting for
both families, Anthropic and OpenAI streaming paths (text and tool use),
non-streaming path, error mapping, and structured output. The integration
test suite is converted from ``@pytest.mark.skip`` to runnable tests; the
imported-model test is gated on the ``STRANDS_BEDROCK_INVOKE_IMPORTED_MODEL_ARN``
environment variable since ARNs are account-specific.
@dgallitelli
Copy link
Copy Markdown
Contributor Author

Trimmed and force-pushed (commit ac804f9) to pass the PR Size Labeler size/xl gate. Diff is now 989 lines (was 1466), all under the 1000-line threshold; size label is back on size/m.

What I cut to get there (no behavior or test-coverage loss):

  • Squashed the per-callback inline event dicts in the streaming/non-streaming emitters behind small _text_start/_tool_use_start/_tool_use_delta/_metadata/_BLOCK_STOP/_TEXT_START helpers, so the wire shape lives in one place.
  • Merged the two tool-choice translators into a single _to_tool_choice(tool_choice, family) (was _to_anthropic_tool_choice + _to_openai_tool_choice).
  • Tightened the BedrockInvokeConfig and class-level docstrings (kept the Attributes elsewhere; the field names + types speak for themselves).
  • Module-scoped pytestmark = pytest.mark.usefixtures("bedrock_client") removed the boilerplate _ = bedrock_client lines from format-only tests.
  • Inlined a few obvious local helpers, collapsed multi-line dict literals where they fit at 120 chars.
  • Dropped the test_bedrock_invoke_configuration integ test — it was a duplicate of unit coverage.

Verification: hatch run test-lint ✓ (ruff + mypy), hatch test tests/strands ✓ (2974 passed), and the same 24-test bedrock_invoke suite still passes (streaming text + tool use for both families, non-streaming, structured output, error mapping).

The three remaining red checks are still the protected-environment ones — same situation as before, only a maintainer can release them. cc @pgrayy @Unshure @zastrowm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants