-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-115999: Add free-threaded specialization for UNPACK_SEQUENCE
#126600
Conversation
def test_unpack_sequence(self): | ||
def f(): | ||
for _ in range(100): | ||
a, b = 1, 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of unrelated to this PR, but I'm surprised constant unpacking like this isn't peepholed into LOAD_CONST, STORE_FAST, LOAD_CONST, STORE_FAST
. Maybe because it's actually a longer instruction sequence than the original? It certainly does less work.
@iritkatriel, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! The changes look good. I think the current implementations of UNPACK_SEQUENCE_{TUPLE,TWO_TUPLE}
are already thread-safe since tuples are immutable. The implementation of UNPACK_SEQUENCE_LIST
isn't thread-safe (there is nothing preventing another thread from adding or removing items from the list while the instruction is executing) so we're going to need a slightly different implementation in the free-threaded build.
I would suggest taking a critical section around the list size check and the code that pushes items onto the stack. Something like:
inst(UNPACK_SEQUENCE_LIST, (unused/1, seq -- values[oparg])) {
PyObject *seq_o = PyStackRef_AsPyObjectBorrow(seq);
DEOPT_IF(!PyList_CheckExact(seq_o));
int should_deopt = 0;
Py_BEGIN_CRITICAL_SECTION(seq_o);
should_deopt = PyList_GET_SIZE(seq_o) != oparg;
if (!should_deopt) {
STAT_INC(UNPACK_SEQUENCE, hit);
PyObject **items = _PyList_ITEMS(seq_o);
for (int i = oparg; --i >= 0; ) {
*values++ = PyStackRef_FromPyObjectNew(items[i]);
}
}
Py_END_CRITICAL_SECTION();
DEOPT_IF(should_deopt);
DECREF_INPUTS();
}
When you're done making the requested changes, leave the comment: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Python/bytecodes.c
Outdated
*values++ = PyStackRef_FromPyObjectNew(items[i]); | ||
int should_deopt = 0; | ||
Py_BEGIN_CRITICAL_SECTION(seq_o); | ||
should_deopt = PyList_GET_SIZE(seq_o) != oparg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be rewritten as something like:
if (PyList_GET_SIZE(seq_o) != oparg) {
END_CRITICAL_SECTION
DEOPT_IF(true);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use the macros because they introduce and close a new scope.
I don't think there's precedence in CPython for using the critical section functions manually (everything uses the macros), but if it's important that we maintain the structure you're suggesting we could do something like
#ifdef Py_GIL_DISABLED
PyCriticalSection _py_cs;
PyCriticalSection_Begin(&_py_cs, seq_o);
#endif
if (PyList_GET_SIZE(seq_o) != oparg) {
#ifdef Py_GIL_DISABLED
PyCriticalSection_End(&_py_cs);
#endif
DEOPT_IF(true);
}
STAT_INC(UNPACK_SEQUENCE, hit);
PyObject **items = _PyList_ITEMS(seq_o);
for (int i = oparg; --i >= 0; ) {
*values++ = PyStackRef_FromPyObjectNew(items[i]);
}
#ifdef Py_GIL_DISABLED
PyCriticalSection_End(&_py_cs);
#endif
DECREF_INPUTS();
Note that the preprocessor guards are not necessary for correctness; the critical section functions are a no-op in default builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't fit into the general pattern of specialized instructions:
- One or more guards
- Zero or one actions
See the docs for why this matters.
Has this been benchmarked?
Performance is neutral overall on the default build. Though strangely the results for the unpack sequence benchmark look like this change is significantly faster. I'm not sure how reliable that benchmark is, however. Performance is improved by 2% overall on the free-threaded build. |
@markshannon - Please have another look at this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Co-authored-by: mpage <mpage@cs.stanford.edu>
@Eclips4 - I think it would be fine to merge this PR. We can put up follow up PRs if any subsequent changes are required. |
…E` (python#126600) Add free-threaded specialization for `UNPACK_SEQUENCE` opcode. `UNPACK_SEQUENCE_TUPLE/UNPACK_SEQUENCE_TWO_TUPLE` are already thread safe since tuples are immutable. `UNPACK_SEQUENCE_LIST` is not thread safe because of nature of lists (there is nothing preventing another thread from adding items to or removing them the list while the instruction is executing). To achieve thread safety we add a critical section to the implementation of `UNPACK_SEQUENCE_LIST`, especially around the parts where we check the size of the list and push items onto the stack. --------- Co-authored-by: Matt Page <mpage@meta.com> Co-authored-by: mpage <mpage@cs.stanford.edu>
--disable-gil
builds #115999