-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
GH-113710: Add a "globals to constants" pass #114592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey @carljm, could you review the changes to the dict watcher here please? I've never seen that code before (sorry). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good, I mostly have suggestions for clarifying comments and a few style nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dict watchers changes look good to me! Nice to see this in use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go for it!
The JIT tests are failing. We should probably force enable uops and hammer the test suite. |
@@ -609,7 +610,11 @@ init_interp_create_gil(PyThreadState *tstate, int gil) | |||
static int | |||
builtins_dict_watcher(PyDict_WatchEvent event, PyObject *dict, PyObject *key, PyObject *new_value) | |||
{ | |||
RARE_EVENT_INC(builtin_dict); | |||
PyInterpreterState *interp = _PyInterpreterState_GET(); | |||
if (event != PyDict_EVENT_CLONED && interp->rare_events.builtin_dict < _Py_MAX_ALLOWED_BUILTINS_MODIFICATIONS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm just misunderstanding, but shouldn't this be >
? As in, if we've modified builtins too many times, we invalidate everything?
Also, what is the purpose of excluding PyDict_EVENT_CLONED
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm just misunderstanding, but shouldn't this be >? As in, if we've modified builtins too many times, we invalidate everything?
Every time the builtins changes any executors that depend on it not changing must be invalidated.
So we invalidate the executor(s) every time it the builtins changes.
Upon reaching the threshold we no longer perform those optimizations, so the executors no longer depend on the builtins not changing, and we no longer need to invalidate.
Also, what is the purpose of excluding PyDict_EVENT_CLONED here?
Cloning doesn't modify the dict, so we don't care about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carljm OOI, why does Cinder care about dict cloning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyDict_EVENT_CLONED
is probably a bad name, because grammatically it implies the subject dict was cloned. What it actually means is that the subject dict was empty, and another dict was cloned into it, via e.g. .update()
. So it does represent a change in the subject dict; all dict watcher events do.
I think it is probably irrelevant here, since it is (at least in practice) impossible(?) for builtins or module globals to be empty when code is running.
And I don't think Cinder cares about it in practice either, for that same reason, it was just for completeness in the dict watcher implementation, since all changes to a dict should be notified by dict watchers.
|
||
if (PyDict_Watch(interp->rare_events.builtins_dict_watcher_id, interp->builtins) != 0) { | ||
interp->dict_state.watchers[0] = &builtins_dict_watcher; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the first two slots are "reserved", it might be clearer to have #define
s for these slots. e.g. BUILTINS_DICT_WATCHER_ID
for 0 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale for bypassing PyDict_AddWatcher
here and special-casing? Just a small amount of startup performance? Here, consistency and encapsulating management of the dict_state.watchers
array might be better maintenance-wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It keeps things simpler if we are guaranteed to have IDs. Otherwise we need to worry about what happens if we can't add either or both of the watchers.
Confirmed locally that the same failures happen with tier two (not just the JIT). |
|
Just curious: why is the error optimizer being called now? |
Maybe we aren't updating a threshold when we change the optimizer? |
Converts specializations of `LOAD_GLOBAL` into constants during tier 2 optimization.
Converts specializations of `LOAD_GLOBAL` into constants during tier 2 optimization.
Converts specializations of `LOAD_GLOBAL` into constants during tier 2 optimization.
Adds a tier 2 optimizer pass that converts the micro-ops for loading globals and builtins to constants.
This should have two benefits:
Benchmarking and stats show a ~1% speedup, but it is the stats that are interesting.
The tier 2 stats show that we have replaced 3 billion guards with 1.2 cheaper guards, and that all
LOAD_GLOBAL_MODULE
andLOAD_GLOBAL_BUILTINS
have been replaced with inline constants.There are some changes to the tier 1 stats, which I think are a result of not optimizing when the global/builtin keys version doesn't match, so we avoid poor optimization. This drops us back into tier 1 for later re-optimization to tier 2, hopefully when the set of global variables has stabilized.
This shows up in the tier 1 stats and optimization attempts.
What this suggests is that we should be de-optimizing faster in tier 1 in this case, as once the keys version has changed it will never go back to the original value. But that's for another PR.