Skip to content

test(kernel-node-runtime): Fix flaky remote-comms E2E tests#927

Merged
sirtimid merged 1 commit intomainfrom
sirtimid/fix-flaky-remote-comms-tests
Apr 8, 2026
Merged

test(kernel-node-runtime): Fix flaky remote-comms E2E tests#927
sirtimid merged 1 commit intomainfrom
sirtimid/fix-flaky-remote-comms-tests

Conversation

@sirtimid
Copy link
Copy Markdown
Contributor

@sirtimid sirtimid commented Apr 8, 2026

Description

Two E2E tests in remote-comms.test.ts were consistently flaky on main, failing more often than passing:

  1. "rejects new messages when queue reaches MAX_QUEUE limit" — timed out at 120s
  2. "resolves promise after reconnection when retries have not been exhausted" — failed with URL redemption timed out after 8000ms

Root cause: Both tests used testBackoffOptions with ackTimeoutMs: 2_000. The RemoteHandle gives up after ackTimeoutMs × (MAX_RETRIES + 1) = 8s, which is too short for kernel2 to restart (new DB connection + kernel init + libp2p + relay connection + vat launch).

Changes

  • Increase ackTimeoutMs to 5_000 for both tests, giving a 20s give-up window instead of 8s
  • Extract named options objects (queueTestOptions, reconnectOptions) for clarity and consistency between setupAliceAndBob and restartKernelAndReloadVat calls
  • Restructure the reconnection test to establish the connection first (completing URL redemption), then stop kernel2 before sending the test message — eliminating both the URL redemption race and the non-deterministic message delivery race
  • Remove unnecessary delay() calls that were masking the underlying timing issue

Testing

Both tests pass reliably after the fix. Verified by running the full remote-comms.test.ts E2E suite twice — all 67 E2E tests pass across all 11 test files (including both previously-flaky tests).

🤖 Generated with Claude Code


Note

Low Risk
Low risk: changes are limited to E2E test timing/configuration and message sequencing; main risk is masking real regressions by increasing timeouts or altering test ordering.

Overview
Stabilizes remote-comms.test.ts E2E coverage by introducing per-test option objects that override ackTimeoutMs (to 5_000) and related rate-limit settings, and then using those same options consistently across setupAliceAndBob and restartKernelAndReloadVat.

Reworks the "reconnect without exhausting retries" test to first establish a connection (complete URL redemption), then stop the remote kernel and queue the message while it’s definitely offline, removing race-prone delays and making the queued-message reconnection behavior deterministic.

Reviewed by Cursor Bugbot for commit 11d93ec. Bugbot is set up for automated code reviews on this repo. Configure here.

Increase ackTimeoutMs from 2s to 5s for two tests whose RemoteHandle
was giving up (after 4×ackTimeoutMs = 8s) before kernel2 could
restart + reconnect through the relay. Also restructure the
"resolves promise after reconnection" test to establish the
connection before the disconnect/reconnect cycle, eliminating a URL
redemption race condition.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sirtimid sirtimid requested a review from a team as a code owner April 8, 2026 21:01
@sirtimid sirtimid added the no-changelog Indicates that no changelog updates are required, and that related CI checks should be skipped. label Apr 8, 2026
@sirtimid sirtimid enabled auto-merge April 8, 2026 21:08
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 78.59%
🟰 ±0%
8630 / 10981
🔵 Statements 78.4%
🟰 ±0%
8766 / 11181
🔵 Functions 76.13%
🟰 ±0%
2019 / 2652
🔵 Branches 76.27%
🟰 ±0%
3714 / 4869
File CoverageNo changed files found.
Generated in workflow #4253 for commit 11d93ec by the Vitest Coverage Report Action

Copy link
Copy Markdown
Member

@rekmarks rekmarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sirtimid sirtimid added this pull request to the merge queue Apr 8, 2026
Merged via the queue into main with commit b6dc2be Apr 8, 2026
34 checks passed
@sirtimid sirtimid deleted the sirtimid/fix-flaky-remote-comms-tests branch April 8, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog Indicates that no changelog updates are required, and that related CI checks should be skipped.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants