-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-117398: Add datetime Module State #119810
gh-117398: Add datetime Module State #119810
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some extra notes about the implementation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding attribute getters to the IsoCalendarDate
type to see the performance of GET_CURRENT_STATE()
:
from timeit import timeit
cmd = 'import _datetime; C = _datetime.date.today().isocalendar()'
print('base:', timeit('C.base', cmd)) # PyType_GetModuleByDef()
print('test:', timeit('C.test', cmd)) # GET_CURRENT_STATE()
Windows non-debug build:
base: 0.0286023854278028 # PyType_GetModuleByDef()
test: 0.0937444182112813 3.27x # INTERP_KEY: "cached-datetime-module"
test: 0.0406443682964891 1.42x # INTERP_KEY: &_Py_ID(cached_datetime_module)
getset implementation
static PyObject *
bench_base(PyDateTime_IsoCalendarDate *self, void *unused)
{
PyObject *mod = PyType_GetModuleByDef(Py_TYPE(self), &datetimemodule);
datetime_state *st = get_module_state(mod);
Py_RETURN_NONE;
}
static PyObject *
bench_test(PyDateTime_IsoCalendarDate *self, void *unused)
{
datetime_state *st = GET_CURRENT_STATE(st);
RELEASE_CURRENT_STATE(st);
Py_RETURN_NONE;
}
static PyGetSetDef iso_calendar_date_getset[] = {
{"base", (getter)bench_base},
{"test", (getter)bench_test},
{NULL}
};
#define DATETIME_ADD_MACRO(dict, c, value_expr) \ | ||
do { \ | ||
if (PyDict_GetItemString(dict, c) == NULL) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This macro needs Py_DECREF(value_expr)
if the condition is not true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression isn't evaluated until the second statement inside this branch. Perhaps I've missed something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#define DATETIME_ADD_MACRO(dict, c, value_expr) \
do { \
if (PyDict_GetItemString(dict, c) == NULL) { \
... \
Py_DECREF(value); \
} \
+ else { \
+ Py_DECREF(value_expr); \
+ } \
} while(0)
If this example fixes the refleaks test, then the leak comes from:
cpython/Modules/_datetimemodule.c
Lines 7173 to 7179 in 3de1cd3
/* -23:59 */ | |
PyObject *min = create_timezone_from_delta(-1, 60, 0, 1); | |
DATETIME_ADD_MACRO(d, "min", min); | |
/* +23:59 */ | |
PyObject *max = create_timezone_from_delta(0, (23 * 60 + 59) * 60, 0, 0); | |
DATETIME_ADD_MACRO(d, "max", max); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I understand now. Yeah, fixing that does resolve the refleak. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Modules/_datetimemodule.c
Outdated
copy_state(datetime_state *st, datetime_state *from) | ||
{ | ||
*st = (datetime_state){ | ||
.isocalendar_date_type | ||
= (PyTypeObject *)Py_NewRef(from->isocalendar_date_type), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ht_module
field of the heaptype needs to be updated here to keep the module alive even after del sys.modules['_datetime']
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It would probably be best to create a new type object in this case rather than add a ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this type is easier to maintain if it is static as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. However, I'd rather avoid using static types for a multi-phase init module if possible.
Thanks for the review, @neonene! I have everything sorted out and the free-threaded/WASI builds should be passing now. |
Does it make sense for this PR to drop the |
Good question. I'll take a look. |
Let's not block this PR on that. We can drop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting out of hand; I con't stop thinking of the suggestion in that other thread about allowing immortal singleton modules.
I'm also reminded of the single-phase-init behaviour of copying __dict__
contents to reuse types/singletons...
On the other hand, if this is the last (so, presumably, most complicated) stdlib module that needs to support isolated state, I guess things aren't all that bad.
Modules/_datetimemodule.c
Outdated
#define PyTZInfo_Check(op) PyObject_TypeCheck(op, get_datetime_state()->tzinfo_type) | ||
#define PyTZInfo_CheckExact(op) Py_IS_TYPE(op, get_datetime_state()->tzinfo_type) | ||
#define GET_CURRENT_STATE(ST_VAR) \ | ||
NULL; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's there so the macro can transparently declare the current_mod
local var. However, that made more sense in an earlier iteration of the PR. I'll simplify this for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Yeah, this is the last one and the only one with the complexity of its own C-API (which exposes objects directly). FWIW, I'm not especially satisfied with the "current module" stuff here, but consider it to be a straight-forward solution for this immediate, isolated situation. I expect we can find a better approach for 3.14+, but didn't want to take those sorts of risks in 3.13. |
Any objections to merging this PR at this point? |
In the interest of landing this for beta 2, I'm going to merge this now. I'm happy to apply tweaks in a follow-up PR. |
Thanks @ericsnowcurrently for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13. |
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out. We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state. The solution I came up with is somewhat novel, but I consider it straightforward. Also, it shouldn't have much impact on performance. In summary, this main changes here are: * I've added some macros to help hide how various objects relate to module state * as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref) * if the static type method is used after the module has been deleted, it is reloaded * to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet * during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time * we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state) (cherry picked from commit d82a7ba) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
GH-120004 is a backport of this pull request to the 3.13 branch. |
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out. We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state. The solution I came up with is somewhat novel, but I consider it straightforward. Also, it shouldn't have much impact on performance. In summary, this main changes here are: * I've added some macros to help hide how various objects relate to module state * as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref) * if the static type method is used after the module has been deleted, it is reloaded * to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet * during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time * we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out. We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state. The solution I came up with is somewhat novel, but I consider it straightforward. Also, it shouldn't have much impact on performance. In summary, this main changes here are: * I've added some macros to help hide how various objects relate to module state * as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref) * if the static type method is used after the module has been deleted, it is reloaded * to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet * during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time * we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state) (cherry picked from commit d82a7ba, AKA gh-119810) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out. We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state. The solution I came up with is somewhat novel, but I consider it straightforward. Also, it shouldn't have much impact on performance. In summary, this main changes here are: * I've added some macros to help hide how various objects relate to module state * as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref) * if the static type method is used after the module has been deleted, it is reloaded * to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet * during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time * we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out. We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state. The solution I came up with is somewhat novel, but I consider it straightforward. Also, it shouldn't have much impact on performance. In summary, this main changes here are: * I've added some macros to help hide how various objects relate to module state * as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref) * if the static type method is used after the module has been deleted, it is reloaded * to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet * during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time * we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out. We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state. The solution I came up with is somewhat novel, but I consider it straightforward. Also, it shouldn't have much impact on performance. In summary, this main changes here are: * I've added some macros to help hide how various objects relate to module state * as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref) * if the static type method is used after the module has been deleted, it is reloaded * to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet * during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time * we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)
I was able to make use of the existing
datetime_state
struct, but there was one tricky thing I had to sort out. We mostly aren't converting to heap types, so we can't use things likePyType_GetModuleByDef()
to look up the module state. The solution I came up with is somewhat novel, but I consider it straightforward. Also, it shouldn't have much impact on performance.In summary, this main changes of this PR are:
__dict__
s; to simplify things, we only do that the first time; once those static types have a separate__dict__
per interpreter, we'll do it every time_datetime_global_state
)