test(e2e): replace flaky Python live policy update tests with Rust#742
Merged
johntmyers merged 1 commit intomainfrom Apr 2, 2026
Merged
test(e2e): replace flaky Python live policy update tests with Rust#742johntmyers merged 1 commit intomainfrom
johntmyers merged 1 commit intomainfrom
Conversation
drew
previously approved these changes
Apr 2, 2026
Remove test_live_policy_update_and_logs and test_live_policy_update_from_empty_network_policies from the Python e2e suite. Both used a manual 90s poll loop against GetSandboxPolicyStatus that flaked in CI with 'Policy v2 was not loaded within 90s'. Add e2e/rust/tests/live_policy_update.rs with two replacement tests that exercise the same policy lifecycle (version bumping, hash idempotency, policy list history) through the CLI using the built-in --wait flag for reliable synchronization.
64a46ec to
4501568
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace two flaky Python e2e tests for live policy updates with Rust e2e tests that use the CLI's built-in
--waitflag for reliable synchronization instead of manual 90s poll loops.Related Issue
Fixes flaky
E2E (python)job failure: https://github.com/NVIDIA/OpenShell/actions/runs/23920278132/job/69765280000?pr=740Changes
e2e/python/test_sandbox_policy.py:test_live_policy_update_and_logs— flaked with "Policy v2 was not loaded within 90s" due to manualtime.sleep(2)poll loop with hard 90s deadlinetest_live_policy_update_from_empty_network_policies— same poll pattern, same flake riske2e/rust/tests/live_policy_update.rswith two tests:live_policy_update_round_trip— set policy A, verify version, re-push A (idempotent), push B (version bump via--wait), re-push B (idempotent), verifypolicy listhistorylive_policy_update_from_empty_network_policies— set empty network policy, push policy with rules, verify version bumpsWhy Rust tests are more reliable
The Python tests polled
GetSandboxPolicyStatusRPC every 2s with a 90s hard deadline. The Rust tests useopenshell policy set --wait --timeout 120, which delegates synchronization to the CLI's own wait logic — eliminating the timing sensitivity.Coverage notes
GetSandboxLogsRPC: was tested in the old test but not replaced — known gap for follow-upTesting
cargo clippypasses on new test (no warnings)cargo check --features e2epasses for all Rust e2e testsmise run pre-commit(CI will verify)Checklist