-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple racing class attribute read-write crashes on free-threaded builds #118362
Comments
I'm not assigning myself because I'm still unable to figure this out at the moment. So anyone's free to take it up. |
The modification to the type's dictionary happens before There's a somewaht similar issue with the GIL because the destructors can call arbitrary code before the cache is invalidated, but it's much harder to trigger a crash because the old value finalizer is called before the memory is freed. cc @DinoV Lines 5364 to 5379 in 8b56d82
|
A starting point would be to call For example, something like the following pseudo-code: BEGIN_TYPE_LOCK();
PyInterpreterState_Get()->types.cache_enabled = false;
type_modified_unlocked(type);
res = _PyObject_GenericSetAttrWithDict((PyObject *)type, name, value, NULL);
if (res == 0) {
...
}
PyInterpreterState_Get()->types.cache_enabled = true;
END_TYPE_LOCK(); The caching logic in |
I don't think we can mark the type as being modified before doing the set, as it delivers an event that may care about seeing the new value. There's also other potential issues around re-entrancy in the set, e.g. if there's a meta-type with a descriptor that then does a lookup against the type it'll have a valid version tag re-assigned before the mutation happens. We could clear this cache entry first if it's the type and still use the flag to not update it. Another possibility would be that we lookup the current attribute (via the cache or MRO) and incref it and then decref it after the set. I'm kind of leaning towards the later just because the type cache is so hot and mutation of types is so rare. |
I think there's an additional issue here though which is that |
Ok, we'd need to split up
I'm not sure I understand if there's some issue specific to meta-types here. I'm suggesting explicitly disabling all type cache updates for essentially the duration we hold the type lock.
This doesn't sound robust to me. One of the big problems is that the type lock is a critical section that doesn't really exclude concurrent updates because there are many opportunities for it to be suspended.
Yeah that's an additional issue. |
My comment here was specifically about the pseudo-code moving the
Not updating the type cache during this process would mean we're not corrupting the type cache with a stale value but a type-watcher would still become out of sync. So at the very least I think the
But if the object is kept alive by |
Ok, I think I understand now. I'd revise my suggestion so that
But they'll get stale cache entries -- that seems bad to me. To a certain extent, this already happens in CPython with the GIL, but it's going to be more common and weirder with the GIL disabled. |
It seems like logically though there's no other synchronization happening and so the write isn't actually done until the version is invalidated. I guess it's possible that the re-entrancy could cause a 2nd write to happen that could be observed while still seeing the old value in the cache though breaking sequential consistency (although I guess to your point that's probably possible w/ the GIL already too). |
Here's a single-threaded example of seeing a stale cached value (even with the GIL): I basically have two concerns:
|
This is what I was thinking for keeping it alive in
|
I think #118454 will fix it. I went with something closer to what @colesbury described with the multi-phase type version invalidation. But to split the invalidation into 2 phases could potentially leave the types in an inconsistent state, or we'd need to remember what types we actually invalidated. I've instead made the update of the dict atomic with the invalidation of the type. This isn't very difficult to guarantee because other than the descriptor case the only thing that can cause re-entrancy below the lock is the finalizer on the object. The keys we're using against the dict are all exact unicode, and types only contain exact unicode, so we know the lookups are otherwise safe. |
And that wouldn't have worked if the value was cached for a subclass but not for the defining class. |
…e face of concurrent mutators (#118454) Add _PyType_LookupRef and use incref before setting attribute on type Makes setting an attribute on a class and signaling type modified atomic Avoid adding re-entrancy exposing the type cache in an inconsistent state by decrefing after type is updated
* Skip tests when threads aren't available * Use ThreadPoolExecutor
This is fixed now |
… in the face of concurrent mutators (python#118454) Add _PyType_LookupRef and use incref before setting attribute on type Makes setting an attribute on a class and signaling type modified atomic Avoid adding re-entrancy exposing the type cache in an inconsistent state by decrefing after type is updated
…8666) * Skip tests when threads aren't available * Use ThreadPoolExecutor
Crash report
What happened?
The following segfaults:
The first issue is that we have
assert(_PyType_HasFeature(type, Py_TPFLAGS_VALID_VERSION_TAG));
intypeobject.c
, which may not be true as another thread can modify the attribute in the meantime. We should remove that assertion.The next issue, which I have not yet been able to solve, is that the attribute evaporates midway in
_Py_type_getattro_impl
. This is likely caused by a decref of another thread. So far I've tried changing_PyType_Lookup
andfind_name_in_mro
to return strong references, but they don't seem to solve it for some reason.CPython versions tested on:
CPython main branch
Operating systems tested on:
No response
Output from running 'python -VV' on the command line:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: