diff --git a/docs.json b/docs.json index 6d49a81..69498a2 100644 --- a/docs.json +++ b/docs.json @@ -17,6 +17,7 @@ "group": "Tracing", "pages": [ "tracing/introduction", + "tracing/pii-redaction", { "group": "SDKs", "pages": [ @@ -128,6 +129,11 @@ ] }, "contextual": { - "options": ["copy", "view", "chatgpt", "claude"] + "options": [ + "copy", + "view", + "chatgpt", + "claude" + ] } } diff --git a/tracing/introduction.mdx b/tracing/introduction.mdx index 9b55260..aacb88e 100644 --- a/tracing/introduction.mdx +++ b/tracing/introduction.mdx @@ -63,3 +63,9 @@ Create an API key from [Settings → API Keys](https://app.zeroeval.com/settings skill](/integrations/skills) can handle SDK setup, first trace, and prompt migration for you. + + + Need source-side PII redaction before span data is buffered, logged, or sent? + ZeroEval supports opt-in redaction in both SDKs. See [PII + redaction](/tracing/pii-redaction). + diff --git a/tracing/pii-redaction.mdx b/tracing/pii-redaction.mdx new file mode 100644 index 0000000..e72bb8e --- /dev/null +++ b/tracing/pii-redaction.mdx @@ -0,0 +1,183 @@ +--- +title: "PII redaction" +description: "Opt-in SDK-side redaction for tracing data in the Python and TypeScript SDKs" +--- + +ZeroEval supports opt-in source-side PII redaction in the Python and TypeScript +SDKs. When enabled, sensitive values are redacted before spans are buffered, +logged, or sent by the SDK. + + + This is an SDK-side feature. ZeroEval does not rely on backend-side redaction + for this behavior. + + +## Enable redaction + + + + ```python + import zeroeval as ze + + ze.init( + api_key="YOUR_API_KEY", + redaction={"enabled": True}, + ) + ``` + + + ```typescript + import * as ze from "zeroeval"; + + ze.init({ + apiKey: "YOUR_API_KEY", + redaction: { enabled: true }, + }); + ``` + + + +You can also enable the default redaction toggle with: + +```bash +export ZEROEVAL_REDACT_PII=true +``` + +Python uses snake_case nested keys such as `redact_inputs`, +`redact_session_names`, `sensitive_keys`, and `custom_patterns`. TypeScript uses +the camelCase equivalents `redactInputs`, `redactSessionNames`, +`sensitiveKeys`, and `customPatterns`. + +## What gets redacted + +By default, the SDKs redact sensitive values found in: + +- inputs and outputs +- attributes +- error messages and stacks +- session names +- tag values + +Built-in detectors cover: + +- email addresses +- phone numbers +- SSN-style identifiers +- PAN / credit card numbers +- bearer tokens, JWTs, and common API key formats +- cookie and authorization header values +- IP addresses + +Key-based detection also force-redacts common sensitive fields such as +`email`, `phone`, `password`, `token`, `authorization`, `cookie`, and +`api_key` / `apiKey`. + +## What stays intact + +Redaction is meant to preserve trace structure. The SDKs keep: + +- span names +- trace IDs and span IDs +- timing and status fields +- token counts, model/provider metadata, and cost metadata unless the value + itself is sensitive + +## Placeholder behavior + +Sensitive values are replaced with stable placeholders inside a single trace: + +```text +alice@example.com -> [REDACTED_EMAIL_A] +bob@example.com -> [REDACTED_EMAIL_B] +alice@example.com -> [REDACTED_EMAIL_A] +``` + +- the same normalized sensitive value in one trace gets the same placeholder +- different values in the same trace get different placeholders +- placeholder assignment resets per trace +- matching is exact after normalization only +- this is not fuzzy identity resolution + +That means repeated references across parent and child spans stay joinable +without preserving the original value. + +## Normalization and limitations + +| Type | Normalization used for placeholder reuse | +| --- | --- | +| Email | Trim + lowercase | +| Phone | Digits only | +| SSN / PAN | Digits only | +| IP | Canonical or lowercase string as implemented by each SDK | +| Secrets | Exact trimmed string | + +Important limitations: + +- there is no reversible backend token vault +- there is no de-anonymization support +- there is no fuzzy entity resolution across semantically related strings +- bypassing SDK capture paths bypasses this protection + +## Examples + +The SDK repositories include runnable examples that match the implemented +behavior: + + + + From `zeroeval-sdk/examples/pii_redaction.py`: + + ```python + import zeroeval as ze + + ze.init( + api_key="sk_ze_demo_local", + redaction={"enabled": True}, + ) + + with ze.span( + name="pii-redaction-demo", + session={"id": "alice@example.com", "name": "Alice alice@example.com"}, + tags={"customer_email": "alice@example.com"}, + ) as span: + span.set_io( + input_data={"email": "alice@example.com", "phone": "+1 (415) 555-1234"}, + output_data={"result": "Reach alice@example.com"}, + ) + span.set_error( + code="ValueError", + message="Failed for alice@example.com with Bearer secret-demo-token", + ) + ``` + + + From `zeroeval-ts/examples/10-pii-redaction.ts`: + + ```typescript + import * as ze from "zeroeval"; + + ze.init({ + apiKey: "demo-api-key", + redaction: { enabled: true }, + }); + + const span = ze.tracer.startSpan("demo.pii_redaction", { + sessionId: "alice@example.com", + sessionName: "Alice Example ", + tags: { customer_email: "alice@example.com" }, + }); + + span.setIO( + { + email: "alice@example.com", + phone: "+1 (415) 555-1212", + apiKey: "sk-live-abcdef1234567890", + }, + "Send follow-up to alice@example.com and bob@example.com" + ); + ``` + + + +In both SDKs, repeated exact values such as the same email address will reuse +the same placeholder within the trace. diff --git a/tracing/sdks/python/reference.mdx b/tracing/sdks/python/reference.mdx index 1146f28..5a7bfa9 100644 --- a/tracing/sdks/python/reference.mdx +++ b/tracing/sdks/python/reference.mdx @@ -24,10 +24,11 @@ def init( api_url: str = None, disabled_integrations: list[str] = None, enabled_integrations: list[str] = None, - setup_otlp: bool = True, + setup_otlp: bool = False, service_name: str = "zeroeval-app", tags: dict[str, str] = None, - sampling_rate: float = None + sampling_rate: float = None, + redaction: dict[str, object] = None ) -> None ``` @@ -40,10 +41,11 @@ def init( | `api_url` | `str` | `"https://api.zeroeval.com"` | API endpoint URL | | `disabled_integrations` | `list[str]` | `None` | Integrations to disable (e.g. `["langchain"]`) | | `enabled_integrations` | `list[str]` | `None` | Only enable these integrations | -| `setup_otlp` | `bool` | `True` | Configure OpenTelemetry OTLP export | +| `setup_otlp` | `bool` | `False` | Configure OpenTelemetry OTLP export | | `service_name` | `str` | `"zeroeval-app"` | OTLP service name | | `tags` | `dict[str, str]` | `None` | Global tags applied to all spans | | `sampling_rate` | `float` | `None` | Sampling rate 0.0-1.0 (1.0 = sample all) | +| `redaction` | `dict[str, object]` | `None` | Source-side PII redaction settings | **Example:** @@ -54,10 +56,30 @@ ze.init( api_key="your-api-key", sampling_rate=0.1, disabled_integrations=["langchain"], + redaction={"enabled": True}, debug=True ) ``` +`redaction` uses snake_case keys in Python: + +| Key | Type | Default | Description | +| --- | --- | --- | --- | +| `enabled` | `bool` | `False` | Turn source-side redaction on | +| `redact_inputs` | `bool` | `True` | Redact `input_data` | +| `redact_outputs` | `bool` | `True` | Redact `output_data` | +| `redact_attributes` | `bool` | `True` | Redact span attributes | +| `redact_errors` | `bool` | `True` | Redact error messages and stacks | +| `redact_session_names` | `bool` | `True` | Redact session names | +| `redact_tag_values` | `bool` | `True` | Redact span, trace, and session tag values | +| `sensitive_keys` | `list[str]` | built-in list | Force-redact matching keys such as `email`, `api_key`, `authorization`, and `cookie` | +| `custom_patterns` | `list[Pattern[str] \| str]` | `[]` | Additional regex patterns to redact | + + + See [PII redaction](/tracing/pii-redaction) for scope, placeholder behavior, + normalization rules, and examples. + + ## Decorators ### `@span` @@ -895,6 +917,7 @@ Set before importing ZeroEval to configure default behavior. | `ZEROEVAL_SAMPLING_RATE` | float | `"1.0"` | Sampling rate (0.0-1.0) | | `ZEROEVAL_DISABLED_INTEGRATIONS` | string | `""` | Comma-separated integrations to disable | | `ZEROEVAL_DEBUG` | boolean | `"false"` | Enable debug logging | +| `ZEROEVAL_REDACT_PII` | boolean | `"false"` | Enable SDK-side PII redaction | ```bash export ZEROEVAL_API_KEY="ze_1234567890abcdef" @@ -919,6 +942,7 @@ ze.tracer.configure( flush_interval=0.5, max_spans=100, sampling_rate=0.05, + redaction={"enabled": True}, integrations={"openai": True, "langchain": False} ) ``` diff --git a/tracing/sdks/python/setup.mdx b/tracing/sdks/python/setup.mdx index f130d44..491b34d 100644 --- a/tracing/sdks/python/setup.mdx +++ b/tracing/sdks/python/setup.mdx @@ -34,6 +34,12 @@ ze.init(api_key="YOUR_API_KEY") `~/.config/zeroeval/config.json` + + Need source-side PII redaction? Pass `redaction={"enabled": True}` to + `ze.init(...)` or set `ZEROEVAL_REDACT_PII=true`. See [PII + redaction](/tracing/pii-redaction). + + ## Patterns ### Decorators @@ -301,4 +307,4 @@ zeroeval setup # Run scripts with automatic tracing zeroeval run my_script.py -``` \ No newline at end of file +``` diff --git a/tracing/sdks/typescript/reference.mdx b/tracing/sdks/typescript/reference.mdx index ec85d01..20206c3 100644 --- a/tracing/sdks/typescript/reference.mdx +++ b/tracing/sdks/typescript/reference.mdx @@ -25,12 +25,13 @@ function init(opts?: InitOptions): void; | -------------------- | ------------------------- | -------------------------- | --------------------------------------- | | `apiKey` | `string` | `ZEROEVAL_API_KEY` env | Your ZeroEval API key | | `apiUrl` | `string` | `https://api.zeroeval.com` | Custom API URL | -| `workspaceName` | `string` | `"Personal Organization"` | Workspace/organization name | +| `workspaceName` | `string` | `"Personal Workspace"` | Workspace/organization name | | `flushInterval` | `number` | `10` | Interval in seconds to flush spans | | `maxSpans` | `number` | `100` | Maximum spans to buffer before flushing | | `collectCodeDetails` | `boolean` | `true` | Capture source code context | | `integrations` | `Record` | — | Enable/disable specific integrations | | `debug` | `boolean` | `false` | Enable debug logging | +| `redaction` | `Partial` | — | Source-side PII redaction settings | #### Example @@ -39,10 +40,30 @@ import * as ze from "zeroeval"; ze.init({ apiKey: "your-api-key", + redaction: { enabled: true }, debug: true, }); ``` +`redaction` uses camelCase keys in TypeScript: + +| Key | Type | Default | Description | +| --- | --- | --- | --- | +| `enabled` | `boolean` | `false` | Turn source-side redaction on | +| `redactInputs` | `boolean` | `true` | Redact `input_data` | +| `redactOutputs` | `boolean` | `true` | Redact `output_data` | +| `redactAttributes` | `boolean` | `true` | Redact span attributes | +| `redactErrors` | `boolean` | `true` | Redact error messages and stacks | +| `redactSessionNames` | `boolean` | `true` | Redact session names | +| `redactTagValues` | `boolean` | `true` | Redact span, trace, and session tag values | +| `sensitiveKeys` | `string[]` | built-in list | Force-redact matching keys such as `email`, `apiKey`, `authorization`, and `cookie` | +| `customPatterns` | `Array` | `[]` | Additional patterns to redact | + + + See [PII redaction](/tracing/pii-redaction) for scope, placeholder behavior, + normalization rules, and examples. + + --- ## Wrapper Functions @@ -408,6 +429,7 @@ Set before importing ZeroEval to configure default behavior. | `ZEROEVAL_SAMPLING_RATE` | float | `"1.0"` | Sampling rate (0.0-1.0) | | `ZEROEVAL_DISABLED_INTEGRATIONS` | string | `""` | Comma-separated integrations to disable | | `ZEROEVAL_DEBUG` | boolean | `"false"` | Enable debug logging | +| `ZEROEVAL_REDACT_PII` | boolean | `"false"` | Enable SDK-side PII redaction | ```bash export ZEROEVAL_API_KEY="ze_1234567890abcdef" diff --git a/tracing/sdks/typescript/setup.mdx b/tracing/sdks/typescript/setup.mdx index 8725ef3..409ebad 100644 --- a/tracing/sdks/typescript/setup.mdx +++ b/tracing/sdks/typescript/setup.mdx @@ -41,6 +41,12 @@ ze.init({ }); ``` + + Need source-side PII redaction? Pass `redaction: { enabled: true }` to + `init(...)` or set `ZEROEVAL_REDACT_PII=true`. See [PII + redaction](/tracing/pii-redaction). + + ## Patterns The SDK offers two ways to add tracing to your TypeScript/JavaScript code: @@ -251,6 +257,7 @@ ze.init({ maxSpans: 200, // Buffer up to 200 spans collectCodeDetails: true, // Capture source code context debug: false, // Enable debug logging + redaction: { enabled: true }, integrations: { openai: true, // Enable OpenAI integration vercelAI: true, // Enable Vercel AI SDK integration