gh-146313: Fix multiprocessing ResourceTracker deadlock after os.fork()#146316
gh-146313: Fix multiprocessing ResourceTracker deadlock after os.fork()#146316gpshead merged 6 commits intopython:mainfrom
Conversation
Problem ResourceTracker.__del__ (added in pythongh-88887) calls os.waitpid(pid, 0) which blocks indefinitely if a process created via os.fork() still holds the tracker pipe's write end. The tracker never sees EOF, never exits, and the parent hangs at interpreter shutdown. Root cause Three requirements conflict: - pythongh-88887 wants the parent to reap the tracker to prevent zombies - pythongh-80849 wants mp.Process(fork) children to reuse the parent's tracker via the inherited pipe fd - pythongh-146313 shows the parent can't block in waitpid() if a child holds the fd -- the tracker won't see EOF until all copies close Fix Two layers: Timeout safety-net. _stop_locked() gains a wait_timeout parameter. When called from __del__, it polls with WNOHANG using exponential backoff for up to 1 second instead of blocking indefinitely. At-fork handler. An os.register_at_fork(after_in_child=...) handler closes the inherited pipe fd in the child unless a preserve flag is set. popen_fork.Popen._launch() sets the flag before its fork so mp.Process(fork) children keep the fd and reuse the parent's tracker (preserving pythongh-80849). Raw os.fork() children close the fd, letting the parent reap promptly. Result Scenario Before After Raw os.fork(), parent exits while child alive deadlock ~30ms reap mp.Process(fork), parent joins then exits ~30ms reap ~30ms reap mp.Process(fork), parent exits abnormally deadlock 1s bounded wait No fork (pythongh-88887 scenario) ~30ms reap ~30ms reap The at-fork handler makes the timeout unreachable in all well-behaved paths. The timeout remains as a safety net for abnormal shutdowns.
itamaro
left a comment
There was a problem hiding this comment.
mostly nits, overall looks good, thanks!
|
When you're done making the requested changes, leave the comment: |
Co-authored-by: Itamar Oren <itamarost@gmail.com>
|
🤖 New build scheduled with the buildbot fleet by @gpshead for commit b7d3b38 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F146316%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
|
while waiting for buildbots before hitting merge... stashing my porposed commit message: commit message body
Fix with two layers:
Co-authored-by: Itamar Oren itamarost@gmail.com |
|
Thanks @gpshead for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14. |
…s.fork() (pythonGH-146316) `ResourceTracker.__del__` (added in pythongh-88887 circa Python 3.12) calls os.waitpid(pid, 0) which blocks indefinitely if a process created via os.fork() still holds the tracker pipe's write end. The tracker never sees EOF, never exits, and the parent hangs at interpreter shutdown. Fix with two layers: - **At-fork handler.** An os.register_at_fork(after_in_child=...) handler closes the inherited pipe fd in the child unless a preserve flag is set. popen_fork.Popen._launch() sets the flag before its fork so mp.Process(fork) children keep the fd and reuse the parent's tracker (preserving pythongh-80849). Raw os.fork() children close the fd, letting the parent reap promptly. - **Timeout safety-net.** _stop_locked() gains a wait_timeout parameter. When called from `__del__`, it polls with WNOHANG using exponential backoff for up to 1 second instead of blocking indefinitely. The at-fork handler makes this unreachable in well-behaved paths; it remains for abnormal shutdowns. (cherry picked from commit 3a7df63) Co-authored-by: Gregory P. Smith <68491+gpshead@users.noreply.github.com> Co-authored-by: Itamar Oren <itamarost@gmail.com>
|
GH-148425 is a backport of this pull request to the 3.14 branch. |
…s.fork() (pythonGH-146316) `ResourceTracker.__del__` (added in pythongh-88887 circa Python 3.12) calls os.waitpid(pid, 0) which blocks indefinitely if a process created via os.fork() still holds the tracker pipe's write end. The tracker never sees EOF, never exits, and the parent hangs at interpreter shutdown. Fix with two layers: - **At-fork handler.** An os.register_at_fork(after_in_child=...) handler closes the inherited pipe fd in the child unless a preserve flag is set. popen_fork.Popen._launch() sets the flag before its fork so mp.Process(fork) children keep the fd and reuse the parent's tracker (preserving pythongh-80849). Raw os.fork() children close the fd, letting the parent reap promptly. - **Timeout safety-net.** _stop_locked() gains a wait_timeout parameter. When called from `__del__`, it polls with WNOHANG using exponential backoff for up to 1 second instead of blocking indefinitely. The at-fork handler makes this unreachable in well-behaved paths; it remains for abnormal shutdowns. (cherry picked from commit 3a7df63) Co-authored-by: Gregory P. Smith <68491+gpshead@users.noreply.github.com> Co-authored-by: Itamar Oren <itamarost@gmail.com>
|
GH-148426 is a backport of this pull request to the 3.13 branch. |
…os.fork() (GH-146316) (#148425) gh-146313: Fix multiprocessing ResourceTracker deadlock after os.fork() (GH-146316) `ResourceTracker.__del__` (added in gh-88887 circa Python 3.12) calls os.waitpid(pid, 0) which blocks indefinitely if a process created via os.fork() still holds the tracker pipe's write end. The tracker never sees EOF, never exits, and the parent hangs at interpreter shutdown. Fix with two layers: - **At-fork handler.** An os.register_at_fork(after_in_child=...) handler closes the inherited pipe fd in the child unless a preserve flag is set. popen_fork.Popen._launch() sets the flag before its fork so mp.Process(fork) children keep the fd and reuse the parent's tracker (preserving gh-80849). Raw os.fork() children close the fd, letting the parent reap promptly. - **Timeout safety-net.** _stop_locked() gains a wait_timeout parameter. When called from `__del__`, it polls with WNOHANG using exponential backoff for up to 1 second instead of blocking indefinitely. The at-fork handler makes this unreachable in well-behaved paths; it remains for abnormal shutdowns. (cherry picked from commit 3a7df63) Co-authored-by: Gregory P. Smith <68491+gpshead@users.noreply.github.com> Co-authored-by: Itamar Oren <itamarost@gmail.com>
…os.fork() (GH-146316) (#148426) gh-146313: Fix multiprocessing ResourceTracker deadlock after os.fork() (GH-146316) `ResourceTracker.__del__` (added in gh-88887 circa Python 3.12) calls os.waitpid(pid, 0) which blocks indefinitely if a process created via os.fork() still holds the tracker pipe's write end. The tracker never sees EOF, never exits, and the parent hangs at interpreter shutdown. Fix with two layers: - **At-fork handler.** An os.register_at_fork(after_in_child=...) handler closes the inherited pipe fd in the child unless a preserve flag is set. popen_fork.Popen._launch() sets the flag before its fork so mp.Process(fork) children keep the fd and reuse the parent's tracker (preserving gh-80849). Raw os.fork() children close the fd, letting the parent reap promptly. - **Timeout safety-net.** _stop_locked() gains a wait_timeout parameter. When called from `__del__`, it polls with WNOHANG using exponential backoff for up to 1 second instead of blocking indefinitely. The at-fork handler makes this unreachable in well-behaved paths; it remains for abnormal shutdowns. (cherry picked from commit 3a7df63) Co-authored-by: Gregory P. Smith <68491+gpshead@users.noreply.github.com> Co-authored-by: Itamar Oren <itamarost@gmail.com>
Problem
ResourceTracker.__del__(added in gh-88887) callsos.waitpid(pid, 0)which blocks indefinitely if a process created viaos.fork()still holds the tracker pipe's write end. The tracker never sees EOF, never exits, and the parent hangs at interpreter shutdown.Root cause
Three requirements conflict:
mp.Process(fork)children to reuse the parent's tracker via the inherited pipe fdwaitpid()if a child holds the fd -- the tracker won't see EOF until all copies closeFix
Two layers:
Timeout safety-net.
_stop_locked()gains await_timeoutparameter. When called from__del__, it polls withWNOHANGusing exponential backoff for up to 1 second instead of blocking indefinitely.At-fork handler. An
os.register_at_fork(after_in_child=...)handler closes the inherited pipe fd in the child unless a preserve flag is set.popen_fork.Popen._launch()sets the flag before its fork somp.Process(fork)children keep the fd and reuse the parent's tracker (preserving gh-80849). Rawos.fork()children close the fd, letting the parent reap promptly.Result
os.fork(), parent exits while child alivemp.Process(fork), parent joins then exitsmp.Process(fork), parent exits abnormallyThe at-fork handler makes the timeout unreachable in all well-behaved paths. The timeout remains as a safety net for abnormal shutdowns.