feat(providers): add Google Vertex AI inference provider by maxamillion · Pull Request #1568 · NVIDIA/OpenShell

maxamillion · 2026-05-26T16:15:51Z

Summary

Add Google Vertex AI as a first-class inference provider, supporting both service account (JWT) and gcloud ADC (OAuth2 refresh token) credential flows. Routes Anthropic models through Vertex AI rawPredict and all other models (Gemini, Llama, Mistral, etc.) through the Vertex OpenAI-compatible endpoint. Includes a seccomp policy relaxation for NETLINK_ROUTE sockets required by Vertex client tooling.

Related Issue

Changes

Provider profile & discovery

New providers/google-vertex-ai.yaml with three credential entries: raw service account key (gateway-only, never injected into sandboxes), service account JWT-minted token, and gcloud ADC OAuth2-refreshed token.
ProviderTypeProfile::allows_gateway_refresh_bootstrap() and CredentialRefreshProfile::is_gateway_mintable() replace inline gateway-refresh logic in server and CLI.
normalize_inference_provider_type() in openshell-core is now the single source of truth for provider alias resolution (vertex, vertex-ai, google-vertex → google-vertex-ai).

Inference routing (server)

resolve_vertex_ai_route() dispatches by publisher: Anthropic models get rawPredict URLs with model_in_path=true; all others get the OpenAI-compatible /chat/completions endpoint.
infer_vertex_publisher() maps model prefixes to publishers (6 families: Anthropic, Google, Meta, Mistral, AI21, DeepSeek).
Region-to-host mapping: regional → {region}-aiplatform.googleapis.com, global → aiplatform.googleapis.com, us/eu → aiplatform.{region}.rep.googleapis.com.
Base URL override escape hatch with strict validation (HTTPS, official Vertex hostname, no IP literals, no userinfo, no query/fragment, port 443 only; rejected outright for Anthropic models).
Model ID validation rejects path separators, URL delimiters, percent escapes, traversal segments, whitespace, and control characters.
CredentialLookup enum (PreferredOnly vs PreferredThenAny) prevents raw SA JSON from being picked up as a bearer token.

Router backend

build_provider_url() handles four URL construction cases via model_in_path × request_path_override matrix. Streaming upgrades :rawPredict → :streamRawPredict.
For Vertex Anthropic rawPredict: strips model from request body (Vertex encodes it in path), injects anthropic_version: "vertex-2023-10-16", and strips anthropic-beta header (Vertex rejects unknown beta values).

Provider gRPC (server)

is_non_injectable_provider_credential() prevents raw service account JSON from reaching sandboxes.
Agent config env var injection for Vertex providers: injects ANTHROPIC_VERTEX_PROJECT_ID, GCP_PROJECT_ID, CLOUD_ML_REGION, GCP_LOCATION, GOOSE_PROVIDER=gcp_vertex_ai, etc. so Claude Code, Goose, and OpenCode work inside sandboxes. Explicit credential values take precedence.

Protobuf

ResolvedRoute gains model_in_path (field 8) and request_path_override (field 9).

CLI

--from-gcloud-adc flag on provider create (mutually exclusive with --from-existing and --credential). Reads gcloud ADC from GOOGLE_APPLICATION_CREDENTIALS, $CLOUDSDK_CONFIG/application_default_credentials.json, or ~/.config/gcloud/application_default_credentials.json; validates authorized_user type; configures OAuth2 refresh and mints the first token.
Rollback on failure: deletes orphaned provider, or warns with manual cleanup instructions if deletion also fails.
Vertex-specific config env var discovery (VERTEX_AI_PROJECT_ID, VERTEX_AI_REGION, base URL, publisher).
SandboxUploadPlan refactor consolidates upload existence-check + git-aware planning.
scrub_git_env() prevents inherited git env vars from breaking subprocess git calls.

Sandbox

NETLINK_ROUTE (protocol 0) now allowed through seccomp; all other netlink protocols remain blocked. Required because getifaddrs(3) on Linux uses NETLINK_ROUTE and is called by Node.js, Python, Go, and most HTTP/gRPC client libraries. Security is maintained by CAP_NET_ADMIN absence, network namespace isolation, and nftables rules.
Bundle-to-route conversion populates model_in_path and request_path_override.
enrich_sandbox_baseline_paths() refactored with injectable path_exists closure for testability.

Documentation

New docs/providers/google-vertex-ai.mdx: full provider setup guide covering both auth flows, configuration keys, region/host selection, supported models, sandbox usage with Claude Code and OpenCode, and policy proposals guidance.
Updated inference-routing.mdx, manage-providers.mdx, providers-v2.mdx, supported-agents.mdx, best-practices.mdx for Vertex references.
New architecture/gateway.md Inference Resolution section documenting bundle resolution, Vertex host selection, route shaping, header passthrough, and security model.

Testing

mise run pre-commit passes (lint, format, license headers)
Unit tests added/updated:
- ~35 server inference tests (publisher inference, route resolution for all regions/overrides/validation, model ID validation, bundle integration)
- ~15 router backend tests (URL construction, body rewriting, header stripping, wiremock integration for buffered + streaming)
- ~8 provider gRPC tests (credential injection, agent config injection, SA key filtering, refresh bootstrap)
- ~15 CLI integration tests (ADC happy path, SA rejection, missing file, wrong provider type, configure/rotate rollback, rollback-delete failure, config keys, mutual exclusion)
- 3 seccomp tests (rule conditionality, behavioral NETLINK_ROUTE allowed, behavioral NETLINK_SOCK_DIAG blocked)
- 3 router integration tests (full proxy for Vertex Gemini, Vertex Anthropic buffered, Vertex Anthropic streaming)
- Provider profile, core inference, sandbox bundle, and config tests
E2E tests added/updated (requires live Vertex AI credentials; not run in CI without secrets)

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Architecture docs updated

copy-pr-bot · 2026-05-26T16:15:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Adds Vertex AI provider profiles, routing, credential refresh plumbing, CLI support, docs, and regression coverage. Keeps the related NETLINK_ROUTE seccomp allowance needed by Vertex client tooling that calls getifaddrs.

Cover the full end-to-end setup for running Claude Code and OpenCode inside an OpenShell sandbox via inference.local with a Vertex AI backend: - google-vertex-ai.mdx: add 'Use from a Sandbox' section with tabbed examples for Claude Code (--bare flag, no /v1 suffix) and OpenCode (/v1 suffix required). Add providers_v2_enabled prerequisite and --no-verify note for global region. Document policy proposals table covering metadata.google.internal (always blocked), downloads.claude.ai, and storage.googleapis.com. - inference-routing.mdx: expand 'Use the Local Endpoint' section with tabbed examples for Claude Code, OpenCode, Python OpenAI SDK, and Python Anthropic SDK. Add notes explaining the /v1 path suffix difference between clients. - supported-agents.mdx: update Claude Code and OpenCode rows to mention inference.local support and correct base URL requirements.

TaylorMutch · 2026-05-28T21:32:25Z

/ok to test 09ddf58

On arm64 under heavy CI load, the /proc fd scan in find_socket_inode_owners can transiently miss the parent process's socket fd entry, returning only the child as an owner. This causes resolve_process_identity to return Ok (single owner, no ambiguity check fires) instead of the expected ambiguous-ownership Err. Extend the retry loop to also handle unexpected Ok results, mirroring the existing retry for transient Err results. 10 retries at 50ms gives a 500ms settling window, which is sufficient for procfs to stabilize on loaded arm64 runners.

johntmyers · 2026-05-29T17:35:55Z

 ) -> Result<reqwest::Response, RouterError> {
-    let (builder, url) = prepare_backend_request(client, route, method, path, &headers, body)?;
+    let (builder, url) =
+        prepare_backend_request(client, route, method, path, &headers, body, true)?;


Does this always force an upgrade to :streamRawPredict upstream? Is that intended?

No, prepare_backend_request calls build_provider_url which conditionally sets it.

let suffix = if stream_response && suffix == ":rawPredict" && is_vertex_anthropic_rawpredict_route(route) { ":streamRawPredict" } else { suffix.as_str() };

johntmyers · 2026-05-29T18:23:13Z

+  - name: service_account_key
+    description: Google service account JSON refresh bootstrap material; not injected into sandboxes
+    env_vars: [GOOGLE_SERVICE_ACCOUNT_KEY]
+    required: false


Is this actually read after being written? I don't see it used in minting flows or anywhere else.

The service_account_key credential holds the raw service account JSON. It is not used in the code for getting the token. This is there to block sandbox injection and to make sure it's never used as a bearer token (instead forcing the minting of an access token). The is_non_injectable_provider_credential function implements for former and resolve_vertex_ai_route_requires_minted_access_token() test validates the latter.

Cali0707 · 2026-05-29T21:42:02Z

Hey @maxamillion I was looking at your vertex ai changes here and I was thinking that this problem isn't specific to vertex ai - we would run into this with supporting e.g. Azure OpenAI endpoints as well.

I'm wondering if instead of hardcoding these transforms per-provider (e.g. is_vertex_anthropic_rawpredict_route(), is_azure-*()), it would be better to have some way of attaching declarative transform rules, which the router applies mechanically without needing to know which provider it's talking to. ResolvedRoute already carries model_in_path and request_path_override, we could extend this with a small set of declarative fields for body/header transforms.

For standard providers (OpenAI, NVIDIA, Anthropic), these fields would all be empty so no extra config. For vertex ai, the route resolver would populate them the same way it already sets model_in_path.

As a concrete example of how we could set this in the provider profile:

inference:
  protocol: anthropic_messages # what api the client speaks to inference.local
  model_in_path: true # model ID goes in the URL, not the request body
  request_suffix: ":rawPredict" # append after model ID for buffered requests
  stream_suffix: ":streamRawPredict" # for streaming requests
  body_remove: [model] # fields to remove from the client's JSON body
  body_inject:
    anthropic_version: "vertex-2023-10-16" # k/v pair to add to JSON body if absent
  strip_headers: [anthropic-beta]

I think long term this will make it easier to support more providers/inference endpoints, and keep the router a generic request forwarder, while the provider awareness would just need to stay at route resolution rather than split across resolution + routing.

WDYT?

maxamillion requested review from a team, derekwaynecarr and mrunalp as code owners May 26, 2026 16:15

maxamillion marked this pull request as draft May 26, 2026 16:21

johntmyers reviewed May 26, 2026

View reviewed changes

Comment thread crates/openshell-providers/src/providers/vertex.rs Outdated

johntmyers reviewed May 26, 2026

View reviewed changes

Comment thread docs/providers/google-vertex-ai.mdx

maxamillion marked this pull request as ready for review May 27, 2026 02:52

maxamillion marked this pull request as draft May 27, 2026 03:28

feat(providers): add Google Vertex AI provider

fe3b147

Adds Vertex AI provider profiles, routing, credential refresh plumbing, CLI support, docs, and regression coverage. Keeps the related NETLINK_ROUTE seccomp allowance needed by Vertex client tooling that calls getifaddrs.

maxamillion force-pushed the vertex-provider branch from fb581ba to fe3b147 Compare May 28, 2026 16:27

maxamillion added 2 commits May 28, 2026 12:29

fix: address vertex review findings

09ddf58

maxamillion marked this pull request as ready for review May 28, 2026 20:04

jhjaggars mentioned this pull request May 29, 2026

feat(sandbox): proxy-side AWS SigV4 credential signing for CONNECT tunnels #1631

Open

johntmyers reviewed May 29, 2026

View reviewed changes

This was referenced May 29, 2026

Sigv4 credential signing #1630

Closed

feat(sandbox): proxy-side AWS SigV4 credential signing for CONNECT tunnels #1638

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(providers): add Google Vertex AI inference provider#1568

feat(providers): add Google Vertex AI inference provider#1568
maxamillion wants to merge 4 commits into
NVIDIA:mainfrom
maxamillion:vertex-provider

maxamillion commented May 26, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 26, 2026

Uh oh!

Uh oh!

Uh oh!

TaylorMutch commented May 28, 2026

Uh oh!

johntmyers May 29, 2026

Uh oh!

maxamillion May 29, 2026

Uh oh!

johntmyers May 29, 2026

Uh oh!

maxamillion May 29, 2026

Uh oh!

Cali0707 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

maxamillion commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Testing

Checklist

Uh oh!

copy-pr-bot Bot commented May 26, 2026

Uh oh!

Uh oh!

Uh oh!

TaylorMutch commented May 28, 2026

Uh oh!

johntmyers May 29, 2026

Choose a reason for hiding this comment

Uh oh!

maxamillion May 29, 2026

Choose a reason for hiding this comment

Uh oh!

johntmyers May 29, 2026

Choose a reason for hiding this comment

Uh oh!

maxamillion May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Cali0707 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maxamillion commented May 26, 2026 •

edited

Loading