Skip to content

feat: add bulk-disable-import-audit-processor task handler#244

Open
prithipalpatwal wants to merge 2 commits intomainfrom
feat/bulk-disable-import-audit-processor
Open

feat: add bulk-disable-import-audit-processor task handler#244
prithipalpatwal wants to merge 2 commits intomainfrom
feat/bulk-disable-import-audit-processor

Conversation

@prithipalpatwal
Copy link
Copy Markdown
Contributor

Introduces a new bulk handler that processes N sites in a single invocation — disabling imports per site and calling configuration.save() once for all sites — eliminating the race condition that occurred when N parallel disable-import-audit-processor invocations each wrote the shared Configuration object concurrently.

Please ensure your pull request adheres to the following guidelines:

  • make sure to link the related issues in this description
  • when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes

Related Issues

Thanks for contributing!

Introduces a new bulk handler that processes N sites in a single
invocation — disabling imports per site and calling configuration.save()
once for all sites — eliminating the race condition that occurred when
N parallel disable-import-audit-processor invocations each wrote the
shared Configuration object concurrently.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@tathagat2241
Copy link
Copy Markdown

High Severity Issues

  • Configuration.findLatest() is unhandled - No try-catch around the initial config load. A DynamoDB failure crashes the entire handler — all N sites fail with an unhandled exception.

  • configuration.save() failure returns HTTP 200 - The catch block calls return ok(...) on save failure. Any upstream caller or monitor sees success even though no audit types were actually disabled. Should be a 5xx.

  • No proof the handler is actually wired up - The PR doesn't show the message producer (scheduler/Step Functions) being updated to send bulk-disable-import-audit-processor messages. If no caller sends to this new handler, the original race condition still exists in production.

Medium Severity Issues

  • Sequential site.save() with no batch size cap — a large sites array processes one DynamoDB write at a time, risking Lambda timeout.
  • Per-site scheduledRun field is silently ignored — JSDoc documents it as a per-site field, but the handler only reads taskContext.scheduledRun at the top level.
  • say() in success path is outside try-catch — if Slack API fails on the final summary message, the function throws even though all data was written. The SQS processor would retry, causing duplicate site.save() calls.
  • Partial state on configuration.save() failure — when site.save() succeeds for some sites but configuration.save() fails, those sites have imports disabled in their DB record but the global configuration is not updated. No rollback mechanism.

Low Severity Issues

  • siteUrl not validated before passing to findByBaseURL — null/undefined entry in the sites array can throw unexpectedly.
  • Error messages forwarded verbatim to Slack (could contain internal details like table names, ARNs).

- Wrap Configuration.findLatest() in try/catch; return 500 on failure
- Return internalServerError (500) instead of ok (200) on configuration.save() failure
- Process sites in parallel batches of 10 (SITE_BATCH_SIZE) via Promise.allSettled
- Handle per-site scheduledRun field; skip individual sites when true
- Validate siteUrl before calling findByBaseURL; record error for missing/null values
- Wrap Slack summary calls in try/catch to prevent SQS retries on Slack failures
- Sanitize error messages sent to Slack (no internal details like ARNs or table names)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

This PR will trigger a minor release when merged.

@prithipalpatwal
Copy link
Copy Markdown
Contributor Author

High Severity Issues

  • Configuration.findLatest() is unhandled - No try-catch around the initial config load. A DynamoDB failure crashes the entire handler — all N sites fail with an unhandled exception.
  • configuration.save() failure returns HTTP 200 - The catch block calls return ok(...) on save failure. Any upstream caller or monitor sees success even though no audit types were actually disabled. Should be a 5xx.
  • No proof the handler is actually wired up - The PR doesn't show the message producer (scheduler/Step Functions) being updated to send bulk-disable-import-audit-processor messages. If no caller sends to this new handler, the original race condition still exists in production.

Medium Severity Issues

  • Sequential site.save() with no batch size cap — a large sites array processes one DynamoDB write at a time, risking Lambda timeout.
  • Per-site scheduledRun field is silently ignored — JSDoc documents it as a per-site field, but the handler only reads taskContext.scheduledRun at the top level.
  • say() in success path is outside try-catch — if Slack API fails on the final summary message, the function throws even though all data was written. The SQS processor would retry, causing duplicate site.save() calls.
  • Partial state on configuration.save() failure — when site.save() succeeds for some sites but configuration.save() fails, those sites have imports disabled in their DB record but the global configuration is not updated. No rollback mechanism.

Low Severity Issues

  • siteUrl not validated before passing to findByBaseURL — null/undefined entry in the sites array can throw unexpectedly.
  • Error messages forwarded verbatim to Slack (could contain internal details like table names, ARNs).

High Severity — all fixed:

  • Configuration.findLatest() unhandled — wrapped in try/catch; returns
    internalServerError('Failed to load configuration') on failure and sends a Slack alert.
  • configuration.save() returns 200 on failure — changed to internalServerError('Failed to
    save configuration') (HTTP 500).
  • Handler not wired up — the handler was already registered in src/index.js:36-38. No change
    needed.

Medium Severity — all fixed:

  • Sequential site.save() with no batch size cap — extracted processSiteEntry and process
    sites in parallel batches of 10 via Promise.allSettled.
  • Per-site scheduledRun silently ignored — processSiteEntry now reads scheduledRun from each
    site entry and returns { status: 'skipped' } if true.
  • Slack summary outside try/catch — the entire summary block (say calls after
    configuration.save()) is now wrapped in try/catch; a Slack failure logs the error but doesn't
    throw or cause SQS retries.
  • Partial state on configuration.save() failure — now returns a hard 500, so the SQS
    processor retries the message. The site.save() calls are idempotent (re-disabling an
    already-disabled import is a no-op), so retries are safe.

Low Severity — all fixed:

  • siteUrl not validated — processSiteEntry guards against null/missing siteUrl before calling
    findByBaseURL, returning { status: 'error', error: 'Missing siteUrl' }.
  • Error messages forwarded verbatim to Slack — per-site errors report 'Site processing
    failed' (not error.message); Configuration.findLatest and configuration.save failures send
    generic messages without internal details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants