Fix run attribution loss after waitpoint resume in managed workers by darshitp091 · Pull Request #3724 · triggerdotdev/trigger.dev

darshitp091 · 2026-05-23T10:51:07Z

This PR fixes the telemetry attribution bug reported in issue #3672, where metrics emitted after a task resumes from a waitpoint were being ingested with empty RUN_ID, TASK_SLUG, and ATTEMPT_NUMBER fields.

Problem

On managed Trigger Cloud, long-running tasks that hit a waitpoint and then resume in the same worker process would continue emitting process and Node.js auto-metrics, but those metrics were no longer attributable to the original run. They still reached ClickHouse, but only the environment, project, and machine-level attributes were preserved, so per-run queries like this would miss the post-waitpoint portion of the execution:

SELECT * FROM metrics WHERE run_id = 'run_...'

Root Cause

When the worker flushes telemetry at a waitpoint, it sends FLUSH { disableContext: true }, and the task context marks the run as disabled. That behavior is intentional for the between-runs state so the exporter can strip run-specific fields.

The problem is that when the waitpoint is resolved, the worker only resolves the promise for the runtime and never re-enables the task context. Because the same Node process continues executing the resumed task, the exporter stays in the disabled branch for the rest of that run and keeps dropping the run-specific attributes.

Fix

This change makes the task context explicitly re-enable when a waitpoint is resolved:

Added a taskContext.enable() method alongside the existing disable() method.
Called taskContext.enable() in both managed and dev worker RESOLVE_WAITPOINT handlers.
Added a regression test proving that disable() followed by enable() restores run attribution, including RUN_ID.

Why this fixes it

The waitpoint flush still disables context during the suspension boundary, which preserves the original behavior for between-runs telemetry. The new enable step restores the run-scoped context immediately when execution resumes, so all subsequent metrics emitted by the resumed task keep their original attribution fields and remain queryable by run.

Validation

Added a unit regression test for task-context re-enable behavior.
Verified the touched files have no syntax or type errors in the workspace.

Addresses issue triggerdotdev#3674 by avoiding unconditional secure=true on HTTP ClickHouse URLs in the webapp entrypoint.

Addresses issue triggerdotdev#3672 by re-enabling task context when a waitpoint resumes in the same process, so metrics emitted after resumption keep RUN_ID, TASK_SLUG, and ATTEMPT_NUMBER attributes.

changeset-bot · 2026-05-23T10:51:11Z

⚠️ No Changeset found

Latest commit: 151bb67

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

github-actions · 2026-05-23T10:51:20Z

Hi @darshitp091, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/triggerdotdev/trigger.dev/blob/main/CONTRIBUTING.md for more details.

coderabbitai · 2026-05-23T10:51:27Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 717b9dac-a521-4285-b32e-cea326502315

📥 Commits

Reviewing files that changed from the base of the PR and between 61ca40b and 151bb67.

📒 Files selected for processing (5)

docker/scripts/entrypoint.sh
packages/cli-v3/src/entryPoints/dev-run-worker.ts
packages/cli-v3/src/entryPoints/managed-run-worker.ts
packages/core/src/v3/taskContext/index.test.ts
packages/core/src/v3/taskContext/index.ts

Walkthrough

This PR extracts explicit task context enablement into a dedicated enable() method within TaskContextAPI, refactoring setGlobalTaskContext() to call it instead of setting _runDisabled = false directly. Worker handlers for RESOLVE_WAITPOINT are updated to call taskContext.enable() before resolving waitpoints. A test verifies that disable() followed by enable() correctly restores run attribution. Additionally, the ClickHouse migration setup in the entrypoint script is refactored to use Node-based URL parsing for constructing the GOOSE_DBSTRING connection parameter, replacing bash string manipulation logic.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Fixes a telemetry attribution bug in Trigger.dev managed/dev workers where metrics emitted after resuming from a waitpoint could lose run-scoped attributes (e.g. RUN_ID) by ensuring the task context is re-enabled when a waitpoint resolves.

Changes:

Added TaskContextAPI.enable() and reused it in setGlobalTaskContext() to explicitly re-enable run attribution.
Re-enabled task context in both managed and dev worker RESOLVE_WAITPOINT IPC handlers.
Added a unit regression test around disable() → enable() behavior (but see review comment re: coverage strength).
Updated Docker ClickHouse migration DSN normalization logic in the entrypoint script.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
packages/core/src/v3/taskContext/index.ts	Adds `enable()` and uses it to restore run attribution state.
packages/core/src/v3/taskContext/index.test.ts	Adds a regression test for disable/enable flow.
packages/cli-v3/src/entryPoints/managed-run-worker.ts	Calls `taskContext.enable()` when handling `RESOLVE_WAITPOINT`.
packages/cli-v3/src/entryPoints/dev-run-worker.ts	Calls `taskContext.enable()` when handling `RESOLVE_WAITPOINT`.
docker/scripts/entrypoint.sh	Changes ClickHouse migration connection string handling.

+  it("re-enables run attribution after disable()", () => {
+    const api = TaskContextAPI.getInstance();
+    api.setGlobalTaskContext({ ctx: FAKE_CTX, worker: FAKE_WORKER });
+
+    api.disable();
+
+    expect(api.isRunDisabled).toBe(true);
+
+    api.enable();
+
+    expect(api.isRunDisabled).toBe(false);
+    expect(api.attributes[SemanticInternalAttributes.RUN_ID]).toBe(FAKE_CTX.run.id);
+  });


  # Run ClickHouse migrations
  echo "Running ClickHouse migrations..."
  export GOOSE_DRIVER=clickhouse
-
-  # Ensure secure=true is in the connection string
-  if echo "$CLICKHOUSE_URL" | grep -q "secure="; then
-    # secure parameter already exists, use as is
-    export GOOSE_DBSTRING="$CLICKHOUSE_URL"
-  elif echo "$CLICKHOUSE_URL" | grep -q "?"; then
-    # URL has query parameters, append secure=true
-    export GOOSE_DBSTRING="${CLICKHOUSE_URL}&secure=true"
-  else
-    # URL has no query parameters, add secure=true
-    export GOOSE_DBSTRING="${CLICKHOUSE_URL}?secure=true"
-  fi
+
+  # Goose derives TLS from the URL scheme. Strip any existing secure query
+  # parameter and only set secure=true for https URLs.
+  export GOOSE_DBSTRING="$(node -e 'const url = new URL(process.env.CLICKHOUSE_URL); url.searchParams.delete("secure"); if (url.protocol === "https:") { url.searchParams.set("secure", "true"); } process.stdout.write(url.toString());')"


darshitp091 added 2 commits May 23, 2026 15:46

fix: scheme-aware ClickHouse DSN for migrations

7b9c274

Addresses issue triggerdotdev#3674 by avoiding unconditional secure=true on HTTP ClickHouse URLs in the webapp entrypoint.

fix: restore run attribution after waitpoints

151bb67

Addresses issue triggerdotdev#3672 by re-enabling task context when a waitpoint resumes in the same process, so metrics emitted after resumption keep RUN_ID, TASK_SLUG, and ATTEMPT_NUMBER attributes.

Copilot AI review requested due to automatic review settings May 23, 2026 10:51

github-actions Bot closed this May 23, 2026

Copilot started reviewing on behalf of darshitp091 May 23, 2026 10:51 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix run attribution loss after waitpoint resume in managed workers#3724

Fix run attribution loss after waitpoint resume in managed workers#3724
darshitp091 wants to merge 2 commits into
triggerdotdev:mainfrom
darshitp091:fix/waitpoint-attribution

darshitp091 commented May 23, 2026

Uh oh!

changeset-bot Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

coderabbitai Bot commented May 23, 2026 •

edited

Loading

Review failed

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

darshitp091 commented May 23, 2026

Problem

Root Cause

Fix

Why this fixes it

Validation

Uh oh!

changeset-bot Bot commented May 23, 2026

⚠️ No Changeset found

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

coderabbitai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Estimated code review effort

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 23, 2026 •

edited

Loading