-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-128942: make arraymodule.c free-thread safe (lock-free) #130771
base: main
Are you sure you want to change the base?
Conversation
ping @colesbury |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Disclaimer: I'm not an expert on the FT list implementation, so take some of my comments with a grain of salt.
Seeing good single-threaded performance is nice, but what about multi-threaded scaling? The number of locks that are still here scare me a little--it would be nice if this scaled well for concurrent use as well, especially for operations that don't require concurrent writes (e.g., comparisons and copies).
Note, this is not ready to go, there is the memory issue which needs resolving. |
@ZeroIntensity you can remove the do-not-merge, its not an |
The main thing here for acceptance is a benchmark run which I am not able to start (I only did local pyperformance check against main), so someone with access will have to initiate that to compare with main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gotten a chance to look through arraymodule.c
yet. I'll review that later this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall approach here seems good. A few comments below.
The actual
Are there any other places where this needs to take place? Its the test and trying to run it with
Which is not Left the bad |
I'd like
Yes, |
TL;DR:If you are fine with data race on resize (due to memcpy from current array buffer to new array buffer not being atomic), then this PR is ready to go (or at least review and benchmark properly). If you want to eliminate race on resize then its either atomic memcpy or locked writes. With respect to this this PR:
That's fine, I only added it because I was under the impression you wanted to eliminate ALL data races. Removed.
Data race on memcpy or no, I am fairly confident the resize here happens correctly. New
So ... Seeing as the elementwise atomic reads / writes as they are here don't appear to affect performance I would say they are fine to leave in. Other points:
EDIT: If you mean vs. non-free-thread-safe, updated scimark numbers are in header of this PR. I was surprised that the aggregate atomic memcpy had the same performance as vanilla memcpy (at least according to the quick timings I did). For reference the atomic memcpy inner loop compiled to the asm below, and the code preceding it appears to align this block to a cache line:
So maybe performance not a worry with that when/if decide to do atomic memory ops eventually.
Wrt tests added here, that big blob of tests in |
In the meantime, is there any particular test or module which needs attention for your clean tsan project or just everything? |
I think ctypes may be the most important right now. See: I think @ZeroIntensity started working on it, but I don't think he's working on it right now. I would really l like to fix:
I am less concerned about concurrent modifications to the same ctypes instances. I don't really want to deal with that for now. |
Yeah, sorry, that's on my list of things to do. I got burnt out a little while ago and haven't thought of picking it up again. I want to do a couple of things with #129824, and then I'll get back to fixing ctypes in the next week or two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the changes look good, but I am still pretty unsure about changing the code to loading and storing items to use FT_ATOMIC_
macros. I think we may be opening a can of worms with all of the different data types.
I think we may want to stick with normal pointer loads and stores. If someone concurrently modifies the same array from multiple threads, that's will be a data race and UB, but the mixing of data types here is also technically UB.
I'm sorry for the conflicting advice I've given. It's hard to know how we should pursue this without seeing the actual code changes. |
Have a look at the possible
IMHO I think you are overly worried about that phrase "undefined behavior" which in practice really just means "undefined (stale) value", otherwise probably no modern system would boot. Also I have never had a problem writing an In any case changed back to non-atomic get/set. Also removed tsan-specific test. |
Small detail, you want to leave or remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor comments below, but I think the alignment of items
is important.
And:
for |
Use something like: https://gist.github.com/colesbury/96f27e2ddf6b151adeeb4c28ed7554d8 The alignment specifier has to be on |
Just a minor nit "MS_WINDOWS" or "_MSC_VER"? The latter is used in the codebase for MSVC-specific directives, and you can run gcc under Windows. |
|
I added lock-free single element reads and writes by mostly copying the
list
object's homework. TL;DR: pyperformance scimark seems to be back to about what it was without the free-thread safe stuff (pending confirmation of course). Tried a few other things but the list strategy seems good enough (except for the negative index thing I mentioned in #130744, if that is an issue).Timings, the relevant ones are "OLD" - non free-thread safe arraymodule, "SLOW" - the previous slower PR and the last two "LFREERW".
array
module is not free-thread safe. #128942