Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115999: Add free-threaded specialization for UNPACK_SEQUENCE #126600

Merged
merged 7 commits into from
Nov 22, 2024

Conversation

Eclips4
Copy link
Member

@Eclips4 Eclips4 commented Nov 8, 2024

def test_unpack_sequence(self):
def f():
for _ in range(100):
a, b = 1, 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of unrelated to this PR, but I'm surprised constant unpacking like this isn't peepholed into LOAD_CONST, STORE_FAST, LOAD_CONST, STORE_FAST. Maybe because it's actually a longer instruction sequence than the original? It certainly does less work.

@iritkatriel, thoughts?

Copy link
Contributor

@mpage mpage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! The changes look good. I think the current implementations of UNPACK_SEQUENCE_{TUPLE,TWO_TUPLE} are already thread-safe since tuples are immutable. The implementation of UNPACK_SEQUENCE_LIST isn't thread-safe (there is nothing preventing another thread from adding or removing items from the list while the instruction is executing) so we're going to need a slightly different implementation in the free-threaded build.

I would suggest taking a critical section around the list size check and the code that pushes items onto the stack. Something like:

inst(UNPACK_SEQUENCE_LIST, (unused/1, seq -- values[oparg])) {
    PyObject *seq_o = PyStackRef_AsPyObjectBorrow(seq);
    DEOPT_IF(!PyList_CheckExact(seq_o));
    int should_deopt = 0;
    Py_BEGIN_CRITICAL_SECTION(seq_o);
    should_deopt = PyList_GET_SIZE(seq_o) != oparg;
    if (!should_deopt) {
        STAT_INC(UNPACK_SEQUENCE, hit);
        PyObject **items = _PyList_ITEMS(seq_o);
        for (int i = oparg; --i >= 0; ) {
            *values++ = PyStackRef_FromPyObjectNew(items[i]);
        }
    }
    Py_END_CRITICAL_SECTION();
    DEOPT_IF(should_deopt);
    DECREF_INPUTS();
}

@bedevere-app
Copy link

bedevere-app bot commented Nov 8, 2024

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@Eclips4 Eclips4 requested a review from mpage November 8, 2024 21:57
Copy link
Contributor

@mpage mpage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

*values++ = PyStackRef_FromPyObjectNew(items[i]);
int should_deopt = 0;
Py_BEGIN_CRITICAL_SECTION(seq_o);
should_deopt = PyList_GET_SIZE(seq_o) != oparg;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be rewritten as something like:

    if (PyList_GET_SIZE(seq_o) != oparg) {
         END_CRITICAL_SECTION
         DEOPT_IF(true);
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use the macros because they introduce and close a new scope.

I don't think there's precedence in CPython for using the critical section functions manually (everything uses the macros), but if it's important that we maintain the structure you're suggesting we could do something like

#ifdef Py_GIL_DISABLED
PyCriticalSection _py_cs;
PyCriticalSection_Begin(&_py_cs, seq_o);
#endif
if (PyList_GET_SIZE(seq_o) != oparg) {
    #ifdef Py_GIL_DISABLED
    PyCriticalSection_End(&_py_cs);
    #endif
    DEOPT_IF(true);
}
STAT_INC(UNPACK_SEQUENCE, hit);
PyObject **items = _PyList_ITEMS(seq_o);
for (int i = oparg; --i >= 0; ) {
    *values++ = PyStackRef_FromPyObjectNew(items[i]);
}
#ifdef Py_GIL_DISABLED
PyCriticalSection_End(&_py_cs);
#endif
DECREF_INPUTS();

Note that the preprocessor guards are not necessary for correctness; the critical section functions are a no-op in default builds.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't fit into the general pattern of specialized instructions:

  • One or more guards
  • Zero or one actions

See the docs for why this matters.

Has this been benchmarked?

@mpage
Copy link
Contributor

mpage commented Nov 13, 2024

Has this been benchmarked?

Performance is neutral overall on the default build. Though strangely the results for the unpack sequence benchmark look like this change is significantly faster. I'm not sure how reliable that benchmark is, however.

Performance is improved by 2% overall on the free-threaded build.

@mpage
Copy link
Contributor

mpage commented Nov 16, 2024

@markshannon - Please have another look at this.

@mpage mpage requested a review from markshannon November 16, 2024 00:42
@Eclips4 Eclips4 requested a review from mpage November 21, 2024 12:01
Copy link
Contributor

@mpage mpage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Co-authored-by: mpage <mpage@cs.stanford.edu>
@mpage
Copy link
Contributor

mpage commented Nov 22, 2024

@Eclips4 - I think it would be fine to merge this PR. We can put up follow up PRs if any subsequent changes are required.

@Eclips4 Eclips4 merged commit 27486c3 into python:main Nov 22, 2024
57 checks passed
@Eclips4 Eclips4 deleted the ft-specialize-unpack-sequence branch November 22, 2024 17:00
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
…E` (python#126600)

Add free-threaded specialization for `UNPACK_SEQUENCE` opcode.
`UNPACK_SEQUENCE_TUPLE/UNPACK_SEQUENCE_TWO_TUPLE` are already thread safe since tuples are immutable.
`UNPACK_SEQUENCE_LIST` is not thread safe because of nature of lists (there is nothing preventing another thread from adding items to or removing them the list while the instruction is executing). To achieve thread safety we add a critical section to the implementation of `UNPACK_SEQUENCE_LIST`, especially around the parts where we check the size of the list and push items onto the stack.


---------

Co-authored-by: Matt Page <mpage@meta.com>
Co-authored-by: mpage <mpage@cs.stanford.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants