Skip to content

fix(cmd/docker): prevent race between force-exit goroutine and plugin wait#6878

Open
zampani-docker wants to merge 1 commit intodocker:masterfrom
zampani-docker:zampani/fix-plugin-force-exit-race
Open

fix(cmd/docker): prevent race between force-exit goroutine and plugin wait#6878
zampani-docker wants to merge 1 commit intodocker:masterfrom
zampani-docker:zampani/fix-plugin-force-exit-race

Conversation

@zampani-docker
Copy link
Copy Markdown

Summary

  • Fixes a race condition in tryPluginRun where two goroutines could both call os.Exit with different exit codes when force-killing a plugin after 3 SIGINT/SIGTERM signals
  • Moves os.Exit(1) ownership to the main goroutine; the signal goroutine now only closes a channel and calls Kill()
  • Adds a return after the force-kill to prevent additional signals from causing a close on an already-closed channel (panic)

Root cause: when a plugin ignored context cancellation and 3 SIGINTs were sent, the signal goroutine called plugincmd.Process.Kill() then os.Exit(1). On a loaded runner, plugincmd.Run() in the main goroutine could return first — the SIGKILL'd plugin has ws.ExitStatus() = -1, leading to os.Exit(-1) = exit code 255 instead of 1.

Fix: forceExitCh is closed before Kill(). Since the plugin can only die after SIGKILL is delivered and Run() only returns after the process is reaped, forceExitCh is guaranteed to be closed before Run() returns in the force-kill path. The main goroutine checks the channel and owns os.Exit(1).

Test plan

  • Existing test TestPluginSocketCommunication/detached/the_main_CLI_exits_after_3_signals covers the fixed race — was flaky (exit code 255) on loaded CI runners with RC Docker on Alpine; should now be reliable
  • No new tests required: the fix is a concurrency correctness change to existing behaviour with existing test coverage

🤖 Generated with Claude Code

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
cmd/docker/docker.go 0.00% 10 Missing ⚠️

📢 Thoughts on this report? Let us know!

… wait

When a plugin ignores context cancellation and the user sends 3 SIGINTs,
the CLI kills the plugin with SIGKILL. Previously the signal goroutine
called os.Exit(1) directly; a race existed where plugincmd.Run() could
return first (plugin was SIGKILL'd, so ws.ExitStatus() = -1) and the
main goroutine would call os.Exit(-1) = exit code 255 before the
goroutine reached os.Exit(1).

Fix by moving exit-code ownership to the main goroutine. The signal
goroutine closes forceExitCh before calling Kill(), guaranteeing the
channel is closed before plugincmd.Run() returns (the plugin can only
die after Kill() delivers SIGKILL; Run() only returns after the process
is reaped). The main goroutine checks forceExitCh after Run() returns
and performs the print + os.Exit(1) itself.

Also return from the signal goroutine after the force-kill to prevent
further loop iterations from calling close(forceExitCh) a second time
(which would panic), in case additional signals arrive while the kill
is in flight.

Fixes a flaky failure in TestPluginSocketCommunication/detached/
the_main_CLI_exits_after_3_signals where exit code 255 was observed
instead of 1 on loaded CI runners (RC Docker on Alpine).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Michael Zampani <michael.zampani@docker.com>
@zampani-docker zampani-docker force-pushed the zampani/fix-plugin-force-exit-race branch from c52c101 to 2bc4307 Compare March 21, 2026 22:31
@zampani-docker zampani-docker marked this pull request as ready for review March 21, 2026 22:40
@thaJeztah thaJeztah added this to the 29.4.0 milestone Apr 2, 2026
@thaJeztah thaJeztah requested a review from Copilot April 2, 2026 00:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an exit-code race in tryPluginRun when force-terminating a plugin after multiple termination signals, ensuring the CLI consistently exits with code 1 (instead of occasionally 255) in the force-kill path.

Changes:

  • Introduces a forceExitCh channel to signal the force-kill path from the signal goroutine to the main goroutine.
  • Moves the os.Exit(1) call (and related message/terminal restore) into the main goroutine after plugincmd.Run() returns.
  • Prevents double-closing forceExitCh by returning from the signal loop after initiating a force kill.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants