Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: >-

Warp supports **Bring Your Own API Key (BYOK)** for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts.

This lets you use your own API keys to access models directly, giving you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/inference/model-choice/) for a list of supported models.
This lets you use your own API keys for model access, giving you control over model selection, billing, and data routing. See [Model Choice](/agent-platform/inference/model-choice/) for a list of supported models.

BYOK provides greater flexibility in model access and ensures Warp **never consumes your** [AI credits](/support-and-community/plans-and-billing/credits/) for requests routed through your own keys.

Expand All @@ -31,9 +31,19 @@ Platform credits apply to every cloud agent run on any plan, and to local agent

## How BYOK works

When you add your own model API keys in Warp, those keys are stored **locally on your device** and are **never synced to the cloud**.
When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage), never on Warp's servers. They're used to make requests to your chosen model provider.

Warp uses these API keys when routing your agent requests to the model provider you've configured.
When you send a prompt using a model with the **key icon**:

1. Your local Warp client pulls your API key from your device's secure storage and sends it up to Warp's backend along with your prompt.
2. Warp's agent harness, which runs on Warp's backend, assembles the full request (system instructions, conversation context, tools) and uses your key in-flight to call your chosen model provider (Anthropic, OpenAI, or Google).
3. The provider's response streams back through Warp's backend to your client.

Your API key passes through Warp's servers each time you send a request, but Warp never stores it there — it's used only in-flight to call the provider, then discarded.

:::note
**Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models. BYOK swaps the credential used to call the provider; it does not change where the harness runs.
:::

:::caution
BYOK does not apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because your API keys are stored locally on your device, they are not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
Expand All @@ -45,7 +55,7 @@ When a model is selected using your own key:
* Costs are billed directly through your model provider account.
* Warp does not retain or store your API key on any of its servers.

![Diagram showing how Warp routes BYOK agent requests directly through your provider API key, bypassing Warp credits.](../../../../assets/support-and-community/Pricing-Blog-BYOK.png)
![Diagram showing how Warp authenticates BYOK agent requests with your provider API key, bypassing Warp credits.](../../../../assets/support-and-community/Pricing-Blog-BYOK.png)

## Enabling BYOK

Expand Down Expand Up @@ -117,9 +127,11 @@ You can choose to enable **Warp credit fallback**. When enabled, if an agent req

Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** policies with all of its contracted LLM providers. No customer AI data is retained, stored, or used for training by the model providers.

BYOK prompts and responses transit Warp's backend (see [How BYOK works](#how-byok-works)). Warp does not use this content for training; retention and analytics handling follow the same account-level privacy and telemetry settings that apply to Warp-billed traffic.

However, when you use your own API key:

* Data retention policies depend on your provider’s account settings.
* Data retention policies on the **provider side** depend on your provider’s account settings.
* Warp cannot enforce ZDR for requests sent through your API keys.
* If your Anthropic, OpenAI, or Google account does not have ZDR enabled, your requests may be retained by the provider according to their terms.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for
* **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API.
* **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway.
* **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure.
* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud.
* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage), never on Warp's servers. It's used to make requests to your configured endpoint.

## How it works

Expand All @@ -29,7 +29,19 @@ A custom inference endpoint expects your endpoint to implement the **OpenAI Chat
* **z.ai** - A model provider with an OpenAI-compatible API surface for its models.
* **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control).

When you configure a custom inference endpoint, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers.
When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**, never on Warp's servers. Your API key is used to make requests to your configured endpoint.

When you send a prompt using an endpoint-routed model:

1. Your local Warp client pulls your endpoint URL and API key from your device's secure storage and sends them up to Warp's backend along with your prompt.
2. Warp's agent harness, which runs on Warp's backend, assembles the full request (system instructions, conversation context, tools) and uses your key in-flight to call your configured endpoint.
3. Your endpoint's response streams back through Warp's backend to your client.

Your API key passes through Warp's servers each time you send a request, but Warp never stores it there — it's used only in-flight to call your endpoint, then discarded.

:::note
**Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models and [BYOK](/agent-platform/inference/bring-your-own-api-key/). A custom inference endpoint swaps the upstream destination and credential; it does not change where the harness runs.
:::

:::caution
Custom inference endpoints don't apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because the configuration is stored locally, it isn't available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
Expand All @@ -39,7 +51,7 @@ When a model routed through your endpoint is selected:

* Warp **doesn't consume** your [AI credits](/support-and-community/plans-and-billing/credits/) for that request.
* Costs are billed directly by your endpoint provider.
* Warp doesn't retain or store your endpoint credentials on any of its servers.
* Warp doesn't retain or store your API key on any of its servers.

## Enabling a custom inference endpoint

Expand Down Expand Up @@ -86,13 +98,15 @@ Some AI-powered features (Codebase Context, Active AI recommendations, cloud age

Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers.

Custom inference endpoint prompts and responses transit Warp's backend (see [How it works](#how-it-works)). Warp does not use this content for training; retention and analytics handling follow the same account-level privacy and telemetry settings that apply to Warp-billed traffic.

When you use a custom inference endpoint:

* Data retention is determined by **your endpoint provider** and any upstream model providers they route to.
* Data retention on the **provider side** is determined by your endpoint provider and any upstream model providers they route to.
* Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint.
* If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms.

Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint.
Warp itself never stores your endpoint API key. Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint.

## Centrally managed configuration

Expand Down
Loading