warpdotdev · hongyi-chen · May 26, 2026 · May 26, 2026 · May 26, 2026 · May 26, 2026
diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
@@ -7,7 +7,7 @@ description: >-
 
 Warp supports **Bring Your Own API Key (BYOK)** for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts.
 
-This lets you use your own API keys to access models directly, giving you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/inference/model-choice/) for a list of supported models.
+This lets you use your own API keys for model access, giving you control over model selection, billing, and data routing. See [Model Choice](/agent-platform/inference/model-choice/) for a list of supported models.
 
 BYOK provides greater flexibility in model access and ensures Warp **never consumes your** [AI credits](/support-and-community/plans-and-billing/credits/) for requests routed through your own keys.
 
@@ -31,9 +31,19 @@ Platform credits apply to every cloud agent run on any plan, and to local agent
 
 ## How BYOK works
 
-When you add your own model API keys in Warp, those keys are stored **locally on your device** and are **never synced to the cloud**.
+When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage), never on Warp's servers. They're used to make requests to your chosen model provider.
 
-Warp uses these API keys when routing your agent requests to the model provider you've configured.
+When you send a prompt using a model with the **key icon**:
+
+1. Your local Warp client pulls your API key from your device's secure storage and sends it up to Warp's backend along with your prompt.
+2. Warp's agent harness, which runs on Warp's backend, assembles the full request (system instructions, conversation context, tools) and uses your key in-flight to call your chosen model provider (Anthropic, OpenAI, or Google).
+3. The provider's response streams back through Warp's backend to your client.
+
+Your API key passes through Warp's servers each time you send a request, but Warp never stores it there — it's used only in-flight to call the provider, then discarded.
+
+:::note
+**Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models. BYOK swaps the credential used to call the provider; it does not change where the harness runs.
+:::
 
 :::caution
 BYOK does not apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because your API keys are stored locally on your device, they are not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
@@ -45,7 +55,7 @@ When a model is selected using your own key:
 * Costs are billed directly through your model provider account.
 * Warp does not retain or store your API key on any of its servers.
 
-![Diagram showing how Warp routes BYOK agent requests directly through your provider API key, bypassing Warp credits.](../../../../assets/support-and-community/Pricing-Blog-BYOK.png)
+![Diagram showing how Warp authenticates BYOK agent requests with your provider API key, bypassing Warp credits.](../../../../assets/support-and-community/Pricing-Blog-BYOK.png)
 
 ## Enabling BYOK
 
@@ -117,9 +127,11 @@ You can choose to enable **Warp credit fallback**. When enabled, if an agent req
 
 Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** policies with all of its contracted LLM providers. No customer AI data is retained, stored, or used for training by the model providers.
 
+BYOK prompts and responses transit Warp's backend (see [How BYOK works](#how-byok-works)). Warp does not use this content for training; retention and analytics handling follow the same account-level privacy and telemetry settings that apply to Warp-billed traffic.
+
 However, when you use your own API key:
 
-* Data retention policies depend on your provider’s account settings.
+* Data retention policies on the **provider side** depend on your provider’s account settings.
 * Warp cannot enforce ZDR for requests sent through your API keys.
 * If your Anthropic, OpenAI, or Google account does not have ZDR enabled, your requests may be retained by the provider according to their terms.
 

diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
@@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for
 * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API.
 * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway.
 * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure.
-* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud.
+* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage), never on Warp's servers. It's used to make requests to your configured endpoint.
 
 ## How it works
 
@@ -29,7 +29,19 @@ A custom inference endpoint expects your endpoint to implement the **OpenAI Chat
 * **z.ai** - A model provider with an OpenAI-compatible API surface for its models.
 * **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control).
 
-When you configure a custom inference endpoint, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers.
+When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**, never on Warp's servers. Your API key is used to make requests to your configured endpoint.
+
+When you send a prompt using an endpoint-routed model:
+
+1. Your local Warp client pulls your endpoint URL and API key from your device's secure storage and sends them up to Warp's backend along with your prompt.
+2. Warp's agent harness, which runs on Warp's backend, assembles the full request (system instructions, conversation context, tools) and uses your key in-flight to call your configured endpoint.
+3. Your endpoint's response streams back through Warp's backend to your client.
+
+Your API key passes through Warp's servers each time you send a request, but Warp never stores it there — it's used only in-flight to call your endpoint, then discarded.
+
+:::note
+**Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models and [BYOK](/agent-platform/inference/bring-your-own-api-key/). A custom inference endpoint swaps the upstream destination and credential; it does not change where the harness runs.
+:::
 
 :::caution
 Custom inference endpoints don't apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because the configuration is stored locally, it isn't available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
@@ -39,7 +51,7 @@ When a model routed through your endpoint is selected:
 
 * Warp **doesn't consume** your [AI credits](/support-and-community/plans-and-billing/credits/) for that request.
 * Costs are billed directly by your endpoint provider.
-* Warp doesn't retain or store your endpoint credentials on any of its servers.
+* Warp doesn't retain or store your API key on any of its servers.
 
 ## Enabling a custom inference endpoint
 
@@ -86,13 +98,15 @@ Some AI-powered features (Codebase Context, Active AI recommendations, cloud age
 
 Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers.
 
+Custom inference endpoint prompts and responses transit Warp's backend (see [How it works](#how-it-works)). Warp does not use this content for training; retention and analytics handling follow the same account-level privacy and telemetry settings that apply to Warp-billed traffic.
+
 When you use a custom inference endpoint:
 
-* Data retention is determined by **your endpoint provider** and any upstream model providers they route to.
+* Data retention on the **provider side** is determined by your endpoint provider and any upstream model providers they route to.
 * Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint.
 * If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms.
 
-Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint.
+Warp itself never stores your endpoint API key. Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint.
 
 ## Centrally managed configuration