Skip to content

gh-148259: make list.remove atomic for non-trivial __eq__ comparison#148440

Draft
KowalskiThomas wants to merge 4 commits intopython:mainfrom
KowalskiThomas:kowalski/bug-prevent-race-condition-in-list-remove
Draft

gh-148259: make list.remove atomic for non-trivial __eq__ comparison#148440
KowalskiThomas wants to merge 4 commits intopython:mainfrom
KowalskiThomas:kowalski/bug-prevent-race-condition-in-list-remove

Conversation

@KowalskiThomas
Copy link
Copy Markdown
Contributor

@KowalskiThomas KowalskiThomas commented Apr 12, 2026

What is this PR?

This PR addresses the bug reported in #148259 where calling rich comparison may result in releasing the GIL which may result in list.remove removing the wrong item from the list.

The PR also updates the tests added for #126033.

Copy link
Copy Markdown
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KowalskiThomas KowalskiThomas force-pushed the kowalski/bug-prevent-race-condition-in-list-remove branch from 858e505 to c30ea14 Compare April 13, 2026 21:21
Copy link
Copy Markdown
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raises parameter is now misleading. Please correct it. In addition, please add tests to list itself. And please also update the docs if they need to.

Overall, I'm not convinced by this change. Mutating the list while doing the comparison is unsafe. We just don't want to crash. But making it raise a RuntimeError does not necessarily makes it better. Otherwise, we need to do it for all other mutable sequences in C to have more or less the same behavior.

cc @rhettinger

@bedevere-app
Copy link
Copy Markdown

bedevere-app bot commented Apr 13, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@KowalskiThomas
Copy link
Copy Markdown
Contributor Author

The raises parameter is now misleading. Please correct it. In addition, please add tests to list itself. And please also update the docs if they need to.

Yes, I had to stop in the middle of my work earlier -- I didn't plan to leave it in that state, sorry.

Otherwise, we need to do it for all other mutable sequences in C to have more or less the same behavior.

I think some sequences already raise RuntimeError when incompatible things are done to them concurrently, right?

@KowalskiThomas
Copy link
Copy Markdown
Contributor Author

KowalskiThomas commented Apr 14, 2026

I just had the time to give this another look. Updated the tests again and now everything passes I believe.

EDIT: just realised you also asked for tests on list; I'll add them later today!

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 14, 2026

I think some sequences already raise RuntimeError when incompatible things are done to them concurrently, right?

Yes but if it is not always consistent. I see this as a garbage-in/garbage-out case. Do not mutate your list while doing other things concurrently. We prefer not raising and leaving it as a GIGO rather than adding complexity and changing existing behaviors.

We do fix cases where there hard crashes (segfaults) but ortherwise we try to retain existing behaviors. Anyway, I think we need first to discuss this, perhaps even standardizing this into a PEP, that is, make sure that concurrent mutations consistently raise whatever collection you are using (not just lists or dicts) and how.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 14, 2026

Yes, I had to stop in the middle of my work earlier -- I didn't plan to leave it in that state, sorry.

In this case, please make it a draft.

@KowalskiThomas KowalskiThomas marked this pull request as draft April 17, 2026 22:16
@KowalskiThomas
Copy link
Copy Markdown
Contributor Author

Anyway, I think we need first to discuss this, perhaps even standardizing this into a PEP, that is, make sure that concurrent mutations consistently raise whatever collection you are using (not just lists or dicts) and how.

This makes sense to me. I admit I didn't check what other collections do outside deque (which does raise RuntimeError in case it is mutated concurrently); someone on the original issue did for set and dict (which from what I can tell properly handle those cases).
I'll look more into this and maybe check what the process for a PEP is.

I see this as a garbage-in/garbage-out case. Do not mutate your list while doing other things concurrently. We prefer not raising and leaving it as a GIGO rather than adding complexity and changing existing behaviors.

While I generally understand the GIGO case/rationale, I beg to differ on that one.

The way this can "currently" (with the GIL) happen is a bit convoluted and unlikely to happen in a real life scenario (custom equality operator releasing the GIL at the same time as another thread resumes and mutates the list), so I wouldn't worry too much about it.

However, with the GIL disabled, I can see it happening more commonly (it really just takes two threads working on the same list at the same time, I believe?).
On top of that, many operations on list are atomic in practice (and PEP-703 ensures they stay so without the GIL), so as a normal user who doesn't know anything about how list (or CPython, for that matter) works internally, I would be very confused if sometimes my list.remove didn't remove the right item when all the other list operations always work as expected.

It would seem (at least to me?) much more helpful to get an exception loudly saying that I'm doing something wrong than to have my code randomly fail to do what I want it to every once in a while (which I could potentially only realise because some downstream result of that list.remove isn't behaving as expected, making reproducing and debugging the issue very complicated).

What is your take on that? Happy to be told so if I'm missing part of the picture here, of course.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 18, 2026

My take is:

  • equality is not meant to mutate the list, whether it is with or without the GIL.
  • legitimate use cases that make it unsage without the GIL must be fixed
  • convoluted ones are GIGO. There is no point in making it slower for legitimate use cases just to protect unlikely cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants