Skip to content

fix(large-refs): cleanup based on table read#4716

Merged
icecrasher321 merged 7 commits into
stagingfrom
fix/cleanup-logs-query
May 22, 2026
Merged

fix(large-refs): cleanup based on table read#4716
icecrasher321 merged 7 commits into
stagingfrom
fix/cleanup-logs-query

Conversation

@icecrasher321
Copy link
Copy Markdown
Collaborator

@icecrasher321 icecrasher321 commented May 22, 2026

Summary

Cleanup large refs based on table row tracking state rather than scanning execution_data

Type of Change

  • Other: Performance

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 22, 2026 7:01pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 22, 2026

PR Summary

Medium Risk
Introduces new DB-backed metadata for large execution values and rewires cleanup/deletion logic to rely on it, which affects data retention and could lead to premature or missed deletions if predicates are wrong. Changes also touch background cleanup batching/concurrency and storage deletion behavior, so operational behavior should be reviewed carefully.

Overview
Shifts large execution value lifecycle management from execution_data scanning to DB-tracked metadata. This adds new tables/migration for execution_large_values, execution_large_value_references, and execution_large_value_dependencies, plus a new large-value-metadata module to register owners, track per-execution references (for logs and paused snapshots), and prune stale metadata/tombstones.

Updates execution write paths to populate this metadata: storeLargeValue registers the persisted value owner (and cleans up storage on failed durable metadata writes), materializeLargeValueRef records references before returning cached/durable values, workflow execution log finalization replaces the full reference set, and pause/snapshot persistence does the same for paused_snapshot.

Reworks cleanup-logs to avoid selecting executionData and instead delete unreferenced large values by querying the new tables (plus a legacy cleanup path via workspaceFiles), marks deleted values in DB, prunes metadata, tightens batching limits, and caps Trigger.dev task concurrency. Also switches Azure blob deletion to deleteIfExists() and adjusts/extends tests accordingly.

Reviewed by Cursor Bugbot for commit f5fa8ef. Configure here.

Comment thread apps/sim/lib/execution/payloads/large-value-metadata.ts
Comment thread apps/sim/background/cleanup-logs.ts
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR replaces the previous JSONB-scan approach for large execution value cleanup with a proper table-driven system: three new tables (execution_large_values, execution_large_value_references, execution_large_value_dependencies) track ownership and live references, and the cleanup job queries these tables directly instead of scanning execution_data columns.

  • New large-value-metadata.ts handles registration of large value owners (with transitive dependency closure), reference tracking per execution scope (execution log and paused snapshot sources), and metadata pruning (stale refs, orphaned dependencies, tombstone GC).
  • cleanup-logs.ts gains two new cleanup functions — one for registered keys via execution_large_values and one legacy path for pre-migration keys via workspaceFiles — along with a pruneLargeValueMetadata sweep at the end of each cleanup run.
  • store.ts and logger.ts / human-in-the-loop-manager.ts wire in the registration and reference-replacement calls at the appropriate write points (persist, materialize, completion, pause/resume).

Confidence Score: 4/5

The core cleanup logic is sound, but store.ts has a narrow path where a thrown exception from the registration transaction leaves a blob key permanently untracked.

When registerLargeValueOwner throws (e.g., the dependency closure exceeds 5000 or a DB error rolls back the transaction), storeLargeValue never reaches the if (!registered) branch, so deleteUntrackedPersistedValue is never called and the blob key written by persistValue is orphaned.

apps/sim/lib/execution/payloads/store.ts — the registration+cleanup block around line 165 needs a try/catch so exceptions from registerPersistedValueOwner also trigger key deletion.

Important Files Changed

Filename Overview
apps/sim/lib/execution/payloads/store.ts Adds owner registration and reference tracking on large value store/materialize paths; contains a storage leak when registerPersistedValueOwner throws instead of returning false.
apps/sim/lib/execution/payloads/large-value-metadata.ts New module implementing ownership/reference/dependency tables for large execution values; BFS dependency closure can generate very large NOT IN clauses near the 5000-item cap.
apps/sim/background/cleanup-logs.ts Replaces JSONB-scan-based large value cleanup with table-driven queries against the new execution_large_values / execution_large_value_references tables; adds legacy path for pre-migration keys via workspaceFiles.
packages/db/schema.ts Adds three new tables with appropriate PKs, FK constraints, and partial indexes for cleanup queries.
apps/sim/lib/logs/execution/logger.ts Adds execution_log reference replacement inside the completion transaction, correctly computed before the update and applied atomically with the log write.
apps/sim/lib/cleanup/batch-delete.ts Fixes totalRowLimit tracking to use an attempted counter rather than deleted+failed, preventing over-selection at end of run.
apps/sim/lib/uploads/providers/blob/client.ts Switches deleteFromBlob to deleteIfExists(), making Azure Blob deletions idempotent.

Reviews (6): Last reviewed commit: "do not attempt blob deletion for infra o..." | Re-trigger Greptile

Comment thread apps/sim/background/cleanup-logs.ts
Comment thread apps/sim/background/cleanup-logs.ts
Comment thread packages/db/migrations/0212_sturdy_guardsmen.sql
Comment thread packages/db/schema.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

Comment thread apps/sim/lib/execution/payloads/store.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/lib/execution/payloads/store.ts
Comment thread apps/sim/lib/execution/payloads/store.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/lib/execution/payloads/large-value-metadata.ts Outdated
Comment thread apps/sim/lib/execution/payloads/store.ts
@icecrasher321
Copy link
Copy Markdown
Collaborator Author

bugbot run

@icecrasher321 icecrasher321 merged commit 209ca5f into staging May 22, 2026
14 checks passed
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit f5fa8ef. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant