Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117398: Add datetime Module State #119810

Merged

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented May 30, 2024

I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out. We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state. The solution I came up with is somewhat novel, but I consider it straightforward. Also, it shouldn't have much impact on performance.

In summary, this main changes of this PR are:

  • I've added some macros to help hide how various objects relate to module state
  • as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref)
  • if the static type method is used after the module has been deleted, it is reloaded
  • to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet
  • during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time
  • we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)

@ericsnowcurrently
Copy link
Member Author

ericsnowcurrently commented May 30, 2024

FYI, I'm still tracking down a small refleak. I hope to have that figured out soon.

Copy link
Member Author

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some extra notes about the implementation here.

@ericsnowcurrently ericsnowcurrently requested a review from encukou May 30, 2024 21:32
Copy link
Contributor

@neonene neonene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding attribute getters to the IsoCalendarDate type to see the performance of GET_CURRENT_STATE():

from timeit import timeit
cmd = 'import _datetime; C = _datetime.date.today().isocalendar()'
print('base:', timeit('C.base', cmd))  # PyType_GetModuleByDef()
print('test:', timeit('C.test', cmd))  # GET_CURRENT_STATE()

Windows non-debug build:

base: 0.0286023854278028  # PyType_GetModuleByDef()

test: 0.0937444182112813 3.27x  # INTERP_KEY: "cached-datetime-module"
test: 0.0406443682964891 1.42x  # INTERP_KEY: &_Py_ID(cached_datetime_module)
getset implementation
static PyObject *
bench_base(PyDateTime_IsoCalendarDate *self, void *unused)
{
    PyObject *mod = PyType_GetModuleByDef(Py_TYPE(self), &datetimemodule);
    datetime_state *st = get_module_state(mod);
    Py_RETURN_NONE;
}

static PyObject *
bench_test(PyDateTime_IsoCalendarDate *self, void *unused)
{
    datetime_state *st = GET_CURRENT_STATE(st);
    RELEASE_CURRENT_STATE(st);
    Py_RETURN_NONE;
}

static PyGetSetDef iso_calendar_date_getset[] = {
    {"base", (getter)bench_base},
    {"test", (getter)bench_test},
    {NULL}
};

Comment on lines 7120 to +7122
#define DATETIME_ADD_MACRO(dict, c, value_expr) \
do { \
if (PyDict_GetItemString(dict, c) == NULL) { \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro needs Py_DECREF(value_expr) if the condition is not true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression isn't evaluated until the second statement inside this branch. Perhaps I've missed something?

Copy link
Contributor

@neonene neonene May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#define DATETIME_ADD_MACRO(dict, c, value_expr)         \
    do {                                                \
      if (PyDict_GetItemString(dict, c) == NULL) {      \
        ...                                             \
        Py_DECREF(value);                               \
      }                                                 \
+     else {                                            \
+       Py_DECREF(value_expr);                          \
+     }                                                 \
    } while(0)

If this example fixes the refleaks test, then the leak comes from:

/* -23:59 */
PyObject *min = create_timezone_from_delta(-1, 60, 0, 1);
DATETIME_ADD_MACRO(d, "min", min);
/* +23:59 */
PyObject *max = create_timezone_from_delta(0, (23 * 60 + 59) * 60, 0, 0);
DATETIME_ADD_MACRO(d, "max", max);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understand now. Yeah, fixing that does resolve the refleak. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 6969 to 6973
copy_state(datetime_state *st, datetime_state *from)
{
*st = (datetime_state){
.isocalendar_date_type
= (PyTypeObject *)Py_NewRef(from->isocalendar_date_type),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ht_module field of the heaptype needs to be updated here to keep the module alive even after del sys.modules['_datetime']?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It would probably be best to create a new type object in this case rather than add a ref.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this type is easier to maintain if it is static as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. However, I'd rather avoid using static types for a multi-phase init module if possible.

@ericsnowcurrently
Copy link
Member Author

Thanks for the review, @neonene! I have everything sorted out and the free-threaded/WASI builds should be passing now.

@neonene
Copy link
Contributor

neonene commented Jun 1, 2024

Does it make sense for this PR to drop the no_rerun decorator in datetimetester.py?

@ericsnowcurrently
Copy link
Member Author

Good question. I'll take a look.

@ericsnowcurrently
Copy link
Member Author

Let's not block this PR on that. We can drop @no_rerun separately. I tried it really quickly and there were no refleaks, but the same is true of main. We can figure it out later though.

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting out of hand; I con't stop thinking of the suggestion in that other thread about allowing immortal singleton modules.
I'm also reminded of the single-phase-init behaviour of copying __dict__ contents to reuse types/singletons...

On the other hand, if this is the last (so, presumably, most complicated) stdlib module that needs to support isolated state, I guess things aren't all that bad.

#define PyTZInfo_Check(op) PyObject_TypeCheck(op, get_datetime_state()->tzinfo_type)
#define PyTZInfo_CheckExact(op) Py_IS_TYPE(op, get_datetime_state()->tzinfo_type)
#define GET_CURRENT_STATE(ST_VAR) \
NULL; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why NULL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there so the macro can transparently declare the current_mod local var. However, that made more sense in an earlier iteration of the PR. I'll simplify this for clarity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@ericsnowcurrently
Copy link
Member Author

This is getting out of hand; I con't stop thinking of the suggestion in that other thread about allowing immortal singleton modules. I'm also reminded of the single-phase-init behaviour of copying __dict__ contents to reuse types/singletons...

On the other hand, if this is the last (so, presumably, most complicated) stdlib module that needs to support isolated state, I guess things aren't all that bad.

Yeah, this is the last one and the only one with the complexity of its own C-API (which exposes objects directly).

FWIW, I'm not especially satisfied with the "current module" stuff here, but consider it to be a straight-forward solution for this immediate, isolated situation. I expect we can find a better approach for 3.14+, but didn't want to take those sorts of risks in 3.13.

@ericsnowcurrently
Copy link
Member Author

Any objections to merging this PR at this point?

@ericsnowcurrently ericsnowcurrently added the needs backport to 3.13 bugs and security fixes label Jun 3, 2024
@ericsnowcurrently
Copy link
Member Author

In the interest of landing this for beta 2, I'm going to merge this now. I'm happy to apply tweaks in a follow-up PR.

@ericsnowcurrently ericsnowcurrently merged commit d82a7ba into python:main Jun 3, 2024
35 checks passed
@miss-islington-app
Copy link

Thanks @ericsnowcurrently for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@ericsnowcurrently ericsnowcurrently deleted the datetime-module-state-2 branch June 3, 2024 21:56
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 3, 2024
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out.  We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state.  The solution I came up with is somewhat novel, but I consider it straightforward.  Also, it shouldn't have much impact on performance.

In summary, this main changes here are:

* I've added some macros to help hide how various objects relate to module state
* as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref)
* if the static type method is used after the module has been deleted, it is reloaded
* to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet
* during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time
* we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)
(cherry picked from commit d82a7ba)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Jun 3, 2024

GH-120004 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jun 3, 2024
mliezun pushed a commit to mliezun/cpython that referenced this pull request Jun 3, 2024
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out.  We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state.  The solution I came up with is somewhat novel, but I consider it straightforward.  Also, it shouldn't have much impact on performance.

In summary, this main changes here are:

* I've added some macros to help hide how various objects relate to module state
* as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref)
* if the static type method is used after the module has been deleted, it is reloaded
* to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet
* during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time
* we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)
ericsnowcurrently added a commit that referenced this pull request Jun 3, 2024
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out.  We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state.  The solution I came up with is somewhat novel, but I consider it straightforward.  Also, it shouldn't have much impact on performance.

In summary, this main changes here are:

* I've added some macros to help hide how various objects relate to module state
* as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref)
* if the static type method is used after the module has been deleted, it is reloaded
* to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet
* during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time
* we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)

(cherry picked from commit d82a7ba, AKA gh-119810)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
barneygale pushed a commit to barneygale/cpython that referenced this pull request Jun 5, 2024
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out.  We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state.  The solution I came up with is somewhat novel, but I consider it straightforward.  Also, it shouldn't have much impact on performance.

In summary, this main changes here are:

* I've added some macros to help hide how various objects relate to module state
* as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref)
* if the static type method is used after the module has been deleted, it is reloaded
* to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet
* during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time
* we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out.  We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state.  The solution I came up with is somewhat novel, but I consider it straightforward.  Also, it shouldn't have much impact on performance.

In summary, this main changes here are:

* I've added some macros to help hide how various objects relate to module state
* as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref)
* if the static type method is used after the module has been deleted, it is reloaded
* to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet
* during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time
* we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out.  We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state.  The solution I came up with is somewhat novel, but I consider it straightforward.  Also, it shouldn't have much impact on performance.

In summary, this main changes here are:

* I've added some macros to help hide how various objects relate to module state
* as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref)
* if the static type method is used after the module has been deleted, it is reloaded
* to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet
* during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time
* we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants