feat(agent): ship cloud agent logs to PostHog Logs via OTEL#2566
feat(agent): ship cloud agent logs to PostHog Logs via OTEL#2566pauldambra wants to merge 3 commits into
Conversation
The cloud AgentServer previously shipped nothing to PostHog — its logs only went to console and S3, so cloud runs were hard to debug. Wire the (previously unused) OtelLogWriter into AgentServer: - Add OtelLogWriter.emitLog(level, scope, message, data) that maps the agent's log level to an OTEL severity and stores the scope as a log.scope attribute, mirroring the desktop electron-log OTEL transport. - Route every server log line through a single onLog choke point that emits to OTEL (and still mirrors to the SSE console stream). - Flush on fatal error and shut down on stop so buffered logs are exported before the process exits. - Default the endpoint to /i/v1/logs (matches desktop and the existing OtelTransportConfig; the old /i/v1/agent-logs default was never used). - Add service.version to the resource attributes. Export is enabled only when the sandbox injects POSTHOG_OTEL_LOGS_HOST and POSTHOG_OTEL_LOGS_API_KEY; otherwise it is a no-op, exactly like the desktop transport. This keeps customer task logs out of customer projects until infra provides a dedicated internal logs host + key. Generated-By: PostHog Code Task-Id: 70cb1c35-2d0e-4f15-a0d5-052f1e9e3572
|
React Doctor found no issues in the changed files. 🎉 Reviewed by React Doctor for commit |
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
packages/agent/src/otel-log-writer.test.ts:49-72
**Non-parameterised severity/body tests**
The two new `emitLog` tests only exercise `warn` (with data) and `error` (without data), leaving `debug` and `info` untested. The team's convention is to prefer parameterised tests: an `it.each` over all four levels (`debug`/`info`/`warn`/`error`) would cover the mapping table exhaustively in a single block, and a second `it.each` for the body-formatting cases (data present vs. absent) would remove the duplication between the two tests.
Reviews (1): Last reviewed commit: "feat(agent): ship cloud agent logs to Po..." | Re-trigger Greptile |
Wiring onLog onto the constructor logger routed every pre-session log line through emitConsoleLog, which calls session.logWriter.appendRawLine. Tests that inject a session with a partial logWriter mock then crashed with "appendRawLine is not a function" (13 unit-test failures). Send pre-session logs to OTEL only (emitOtelLog, which never touches session state) and keep the full handleLog (OTEL + SSE console) on the post-init logger, exactly where emitConsoleLog ran before this PR. Also parameterise the emitLog tests (it.each over all four levels and both body-formatting cases) per review feedback. Generated-By: PostHog Code Task-Id: 70cb1c35-2d0e-4f15-a0d5-052f1e9e3572
|
Reviews (2): Last reviewed commit: "Merge branch 'main' into posthog-code/ag..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6a3ac26f24
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| prefix: "[AgentServer]", | ||
| // Pre-session logs go to OTEL only; the SSE console stream needs an | ||
| // active session, which the post-init logger wires in via handleLog. | ||
| onLog: this.emitOtelLog, |
There was a problem hiding this comment.
Preserve fallback logging when OTEL is disabled
When POSTHOG_OTEL_LOGS_HOST/POSTHOG_OTEL_LOGS_API_KEY are absent or the OTEL writer fails to initialize, this onLog callback still causes Logger.emitLog to return before its stdout/stderr fallback, while emitOtelLog is a no-op because otelLogWriter is null. In local/dev runs or misconfigured sandboxes, all constructor/startup/pre-session errors are silently dropped instead of being visible in process logs, making early failures much harder to diagnose; keep the normal console path when OTEL is not actually active.
Useful? React with 👍 / 👎.
Problem
The cloud
AgentServer(the process that runs each task in a sandbox) shipped no telemetry to PostHog — its logs only went to stdout and S3. When something goes wrong with a cloud run (e.g. the recent Slack reports), there's no way to query what happened byrun_id/task_id. TheOtelLogWriterandotelTransportconfig existed but were never wired to anything.This makes the desktop's existing pattern (logs → PostHog Logs via OTEL, indexed by service/run) work for the cloud agent too.
Changes
OtelLogWriter.emitLog(level, scope, message, data)— maps the agent log level to an OTEL severity and stores the scope as alog.scopeattribute, mirroring the desktop electron-log OTEL transport. Addsservice.versionto the resource attributes and defaults the endpoint to/i/v1/logs(matches desktop + the existingOtelTransportConfig; the old/i/v1/agent-logsdefault was never used).AgentServerroutes every server log line through oneonLogchoke point that emits to OTEL and still mirrors to the SSE console stream; flushes on fatal error and shuts down on stop.POSTHOG_OTEL_LOGS_HOST/POSTHOG_OTEL_LOGS_API_KEY/POSTHOG_OTEL_LOGS_PATHenv vars (plumbed viabin.ts→AgentServerConfig.otelLogs).Export is disabled (no-op) unless the sandbox injects the host + key — same behavior as the desktop transport. This is deliberate: the sandbox only has the customer's API URL + personal key, so exporting there would dump customer task logs into customer projects (and a personal key may not even authorize log ingest). A follow-up on the infra side needs to inject a dedicated PostHog-internal logs host + ingest key to turn this on.
Scope note: this is logs-only. Errors land in PostHog Logs as
ERROR-severity records; wiring exceptions into the dedicated Error Tracking product (posthog-node) is a deliberate follow-up.How did you test this?
pnpm --filter @posthog/agent typecheck— passes.pnpm --filter @posthog/agent test otel-log-writer— 5/5 pass (added tests foremitLogseverity/scope/body mapping and the/i/v1/logsdefault endpoint).biome checkon all changed files — clean.agent-server.test.tshas 55 pre-existing failures in this environment (mswbeforeAllcascade) — identical on a cleanmaincheckout, unrelated to this change.Automatic notifications
Created with PostHog Code from a Slack thread