Skip to content

feat(providers): add Google Vertex AI inference provider#1568

Open
maxamillion wants to merge 4 commits into
NVIDIA:mainfrom
maxamillion:vertex-provider
Open

feat(providers): add Google Vertex AI inference provider#1568
maxamillion wants to merge 4 commits into
NVIDIA:mainfrom
maxamillion:vertex-provider

Conversation

@maxamillion
Copy link
Copy Markdown
Collaborator

@maxamillion maxamillion commented May 26, 2026

Summary

Add Google Vertex AI as a first-class inference provider, supporting both service account (JWT) and gcloud ADC (OAuth2 refresh token) credential flows. Routes Anthropic models through Vertex AI rawPredict and all other models (Gemini, Llama, Mistral, etc.) through the Vertex OpenAI-compatible endpoint. Includes a seccomp policy relaxation for NETLINK_ROUTE sockets required by Vertex client tooling.

Related Issue

Changes

Provider profile & discovery

  • New providers/google-vertex-ai.yaml with three credential entries: raw service account key (gateway-only, never injected into sandboxes), service account JWT-minted token, and gcloud ADC OAuth2-refreshed token.
  • ProviderTypeProfile::allows_gateway_refresh_bootstrap() and CredentialRefreshProfile::is_gateway_mintable() replace inline gateway-refresh logic in server and CLI.
  • normalize_inference_provider_type() in openshell-core is now the single source of truth for provider alias resolution (vertex, vertex-ai, google-vertexgoogle-vertex-ai).

Inference routing (server)

  • resolve_vertex_ai_route() dispatches by publisher: Anthropic models get rawPredict URLs with model_in_path=true; all others get the OpenAI-compatible /chat/completions endpoint.
  • infer_vertex_publisher() maps model prefixes to publishers (6 families: Anthropic, Google, Meta, Mistral, AI21, DeepSeek).
  • Region-to-host mapping: regional → {region}-aiplatform.googleapis.com, global → aiplatform.googleapis.com, us/euaiplatform.{region}.rep.googleapis.com.
  • Base URL override escape hatch with strict validation (HTTPS, official Vertex hostname, no IP literals, no userinfo, no query/fragment, port 443 only; rejected outright for Anthropic models).
  • Model ID validation rejects path separators, URL delimiters, percent escapes, traversal segments, whitespace, and control characters.
  • CredentialLookup enum (PreferredOnly vs PreferredThenAny) prevents raw SA JSON from being picked up as a bearer token.

Router backend

  • build_provider_url() handles four URL construction cases via model_in_path × request_path_override matrix. Streaming upgrades :rawPredict:streamRawPredict.
  • For Vertex Anthropic rawPredict: strips model from request body (Vertex encodes it in path), injects anthropic_version: "vertex-2023-10-16", and strips anthropic-beta header (Vertex rejects unknown beta values).

Provider gRPC (server)

  • is_non_injectable_provider_credential() prevents raw service account JSON from reaching sandboxes.
  • Agent config env var injection for Vertex providers: injects ANTHROPIC_VERTEX_PROJECT_ID, GCP_PROJECT_ID, CLOUD_ML_REGION, GCP_LOCATION, GOOSE_PROVIDER=gcp_vertex_ai, etc. so Claude Code, Goose, and OpenCode work inside sandboxes. Explicit credential values take precedence.

Protobuf

  • ResolvedRoute gains model_in_path (field 8) and request_path_override (field 9).

CLI

  • --from-gcloud-adc flag on provider create (mutually exclusive with --from-existing and --credential). Reads gcloud ADC from GOOGLE_APPLICATION_CREDENTIALS, $CLOUDSDK_CONFIG/application_default_credentials.json, or ~/.config/gcloud/application_default_credentials.json; validates authorized_user type; configures OAuth2 refresh and mints the first token.
  • Rollback on failure: deletes orphaned provider, or warns with manual cleanup instructions if deletion also fails.
  • Vertex-specific config env var discovery (VERTEX_AI_PROJECT_ID, VERTEX_AI_REGION, base URL, publisher).
  • SandboxUploadPlan refactor consolidates upload existence-check + git-aware planning.
  • scrub_git_env() prevents inherited git env vars from breaking subprocess git calls.

Sandbox

  • NETLINK_ROUTE (protocol 0) now allowed through seccomp; all other netlink protocols remain blocked. Required because getifaddrs(3) on Linux uses NETLINK_ROUTE and is called by Node.js, Python, Go, and most HTTP/gRPC client libraries. Security is maintained by CAP_NET_ADMIN absence, network namespace isolation, and nftables rules.
  • Bundle-to-route conversion populates model_in_path and request_path_override.
  • enrich_sandbox_baseline_paths() refactored with injectable path_exists closure for testability.

Documentation

  • New docs/providers/google-vertex-ai.mdx: full provider setup guide covering both auth flows, configuration keys, region/host selection, supported models, sandbox usage with Claude Code and OpenCode, and policy proposals guidance.
  • Updated inference-routing.mdx, manage-providers.mdx, providers-v2.mdx, supported-agents.mdx, best-practices.mdx for Vertex references.
  • New architecture/gateway.md Inference Resolution section documenting bundle resolution, Vertex host selection, route shaping, header passthrough, and security model.

Testing

  • mise run pre-commit passes (lint, format, license headers)
  • Unit tests added/updated:
    • ~35 server inference tests (publisher inference, route resolution for all regions/overrides/validation, model ID validation, bundle integration)
    • ~15 router backend tests (URL construction, body rewriting, header stripping, wiremock integration for buffered + streaming)
    • ~8 provider gRPC tests (credential injection, agent config injection, SA key filtering, refresh bootstrap)
    • ~15 CLI integration tests (ADC happy path, SA rejection, missing file, wrong provider type, configure/rotate rollback, rollback-delete failure, config keys, mutual exclusion)
    • 3 seccomp tests (rule conditionality, behavioral NETLINK_ROUTE allowed, behavioral NETLINK_SOCK_DIAG blocked)
    • 3 router integration tests (full proxy for Vertex Gemini, Vertex Anthropic buffered, Vertex Anthropic streaming)
    • Provider profile, core inference, sandbox bundle, and config tests
  • E2E tests added/updated (requires live Vertex AI credentials; not run in CI without secrets)

Checklist

@maxamillion maxamillion requested review from a team, derekwaynecarr and mrunalp as code owners May 26, 2026 16:15
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@maxamillion maxamillion marked this pull request as draft May 26, 2026 16:21
Comment thread crates/openshell-providers/src/providers/vertex.rs Outdated
Comment thread docs/providers/google-vertex-ai.mdx
@maxamillion maxamillion marked this pull request as ready for review May 27, 2026 02:52
@maxamillion maxamillion marked this pull request as draft May 27, 2026 03:28
Adds Vertex AI provider profiles, routing, credential refresh plumbing, CLI support, docs, and regression coverage. Keeps the related NETLINK_ROUTE seccomp allowance needed by Vertex client tooling that calls getifaddrs.
Cover the full end-to-end setup for running Claude Code and OpenCode
inside an OpenShell sandbox via inference.local with a Vertex AI backend:

- google-vertex-ai.mdx: add 'Use from a Sandbox' section with tabbed
  examples for Claude Code (--bare flag, no /v1 suffix) and OpenCode
  (/v1 suffix required). Add providers_v2_enabled prerequisite and
  --no-verify note for global region. Document policy proposals table
  covering metadata.google.internal (always blocked), downloads.claude.ai,
  and storage.googleapis.com.

- inference-routing.mdx: expand 'Use the Local Endpoint' section with
  tabbed examples for Claude Code, OpenCode, Python OpenAI SDK, and
  Python Anthropic SDK. Add notes explaining the /v1 path suffix
  difference between clients.

- supported-agents.mdx: update Claude Code and OpenCode rows to mention
  inference.local support and correct base URL requirements.
@maxamillion maxamillion marked this pull request as ready for review May 28, 2026 20:04
@TaylorMutch
Copy link
Copy Markdown
Collaborator

/ok to test 09ddf58

On arm64 under heavy CI load, the /proc fd scan in
find_socket_inode_owners can transiently miss the parent process's
socket fd entry, returning only the child as an owner. This causes
resolve_process_identity to return Ok (single owner, no ambiguity
check fires) instead of the expected ambiguous-ownership Err.

Extend the retry loop to also handle unexpected Ok results, mirroring
the existing retry for transient Err results. 10 retries at 50ms gives
a 500ms settling window, which is sufficient for procfs to stabilize
on loaded arm64 runners.
) -> Result<reqwest::Response, RouterError> {
let (builder, url) = prepare_backend_request(client, route, method, path, &headers, body)?;
let (builder, url) =
prepare_backend_request(client, route, method, path, &headers, body, true)?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this always force an upgrade to :streamRawPredict upstream? Is that intended?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, prepare_backend_request calls build_provider_url which conditionally sets it.

let suffix = if stream_response
    && suffix == ":rawPredict"
    && is_vertex_anthropic_rawpredict_route(route)
{
    ":streamRawPredict"
} else {
    suffix.as_str()
};

Comment on lines +10 to +13
- name: service_account_key
description: Google service account JSON refresh bootstrap material; not injected into sandboxes
env_vars: [GOOGLE_SERVICE_ACCOUNT_KEY]
required: false
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually read after being written? I don't see it used in minting flows or anywhere else.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service_account_key credential holds the raw service account JSON. It is not used in the code for getting the token. This is there to block sandbox injection and to make sure it's never used as a bearer token (instead forcing the minting of an access token). The is_non_injectable_provider_credential function implements for former and resolve_vertex_ai_route_requires_minted_access_token() test validates the latter.

@Cali0707
Copy link
Copy Markdown
Contributor

Hey @maxamillion I was looking at your vertex ai changes here and I was thinking that this problem isn't specific to vertex ai - we would run into this with supporting e.g. Azure OpenAI endpoints as well.

I'm wondering if instead of hardcoding these transforms per-provider (e.g. is_vertex_anthropic_rawpredict_route(), is_azure-*()), it would be better to have some way of attaching declarative transform rules, which the router applies mechanically without needing to know which provider it's talking to. ResolvedRoute already carries model_in_path and request_path_override, we could extend this with a small set of declarative fields for body/header transforms.

For standard providers (OpenAI, NVIDIA, Anthropic), these fields would all be empty so no extra config. For vertex ai, the route resolver would populate them the same way it already sets model_in_path.

As a concrete example of how we could set this in the provider profile:

inference:
  protocol: anthropic_messages # what api the client speaks to inference.local
  model_in_path: true # model ID goes in the URL, not the request body
  request_suffix: ":rawPredict" # append after model ID for buffered requests
  stream_suffix: ":streamRawPredict" # for streaming requests
  body_remove: [model] # fields to remove from the client's JSON body
  body_inject:
    anthropic_version: "vertex-2023-10-16" # k/v pair to add to JSON body if absent
  strip_headers: [anthropic-beta]

I think long term this will make it easier to support more providers/inference endpoints, and keep the router a generic request forwarder, while the provider awareness would just need to stay at route resolution rather than split across resolution + routing.

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants