Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115999: Specialize LOAD_GLOBAL in free-threaded builds #126607

Merged
merged 19 commits into from
Nov 21, 2024

Conversation

mpage
Copy link
Contributor

@mpage mpage commented Nov 9, 2024

Enable specialization of LOAD_GLOBAL in free-threaded builds.

Thread-safety of specialization in free-threaded builds is provided by the following:

  • A critical section is held on both the globals and builtins objects during specialization. This ensures we get an atomic view of both builtins and globals during specialization.
  • Generation of new keys versions is made atomic in free-threaded builds.
  • Existing helpers are used to atomically modify the opcode.

Thread-safety of specialized instructions in free-threaded builds is provided by the following:

  • Relaxed atomics are used when loading and storing dict keys versions. This avoids potential data races as the dict keys versions are read without holding the dictionary's per-object lock in version guards.
  • Dicts keys objects are passed from keys version guards to the downstream uops. This ensures that we are loading from the correct offset in the keys object. Once a unicode key has been stored in a keys object for a combined dictionary in free-threaded builds, the offset that it is stored in will never be reused for a different key. Once the version guard passes, we know that we are reading from the correct offset.
  • The dictionary read fast-path is used to read values from the dictionary once we know the correct offset.

Performance-wise, this looks like a 3-4% improvement on free-threaded builds on pyperformance.

mpage added 6 commits November 8, 2024 12:47
Thread-safety of specialization in free-threaded builds:

- A critical section is held on both the globals and builtins objects during
  specialization. This ensures we get an atomic view of both builtins and globals
  during specialization.
- Generation of new keys versions is made atomic in free-threaded builds.
- We use helpers safely modify the bytecode.

Thread-safety of specialized instructions in free-threaded builds:

- Dicts keys objects are passed from keys version guards to the downstream uops.
  This ensures that we are loading from the correct offset in the keys object.
  Once a unicode key has been stored in a keys object for a combined dictionary
  in free-threaded builds, the offset that it is stored in will never be reaused
  for a different key.
- The dictionary read fast-path is used to read values from the dictionary.
- Relaxed atomics are used when loading and storing dict keys versions. This
  avoids potential data races as the dict keys versions may be read without
  holding the dictionary's per-object lock while the instructions are executing.
This handles the case where another thread is instrumenting the bytecode.
@mpage mpage marked this pull request as ready for review November 9, 2024 02:00
@mpage mpage requested review from colesbury and Yhg1s November 9, 2024 02:00
Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was less complicated than I initially anticipated. Nice!

LGTM overall, but I'll withold from approval to let others review it first.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A few comments below.

Keys are freed using QSBR. Nothing occurs in between loading the keys
and loading the value that would cause the QSBR version to advance,
ensuring that the keys object cannot be freed and reused.
@mpage mpage requested a review from colesbury November 19, 2024 01:10
@mpage mpage requested a review from colesbury November 19, 2024 06:24
@mpage
Copy link
Contributor Author

mpage commented Nov 21, 2024

!buildbot nogil refleak

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @mpage for commit 01f4143 🤖

The command will test the builders whose names match following regular expression: nogil refleak

The builders matched are:

  • AMD64 Fedora Rawhide NoGIL refleaks PR
  • PPC64LE Fedora Rawhide NoGIL refleaks PR
  • aarch64 Fedora Rawhide NoGIL refleaks PR
  • AMD64 CentOS9 NoGIL Refleaks PR

@mpage mpage merged commit 09c240f into python:main Nov 21, 2024
65 checks passed
@mpage mpage deleted the gh-115999-tlbc-load-global branch November 21, 2024 19:22
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
…thon#126607)

Enable specialization of LOAD_GLOBAL in free-threaded builds.

Thread-safety of specialization in free-threaded builds is provided by the following:

A critical section is held on both the globals and builtins objects during specialization. This ensures we get an atomic view of both builtins and globals during specialization.
Generation of new keys versions is made atomic in free-threaded builds.
Existing helpers are used to atomically modify the opcode.
Thread-safety of specialized instructions in free-threaded builds is provided by the following:

Relaxed atomics are used when loading and storing dict keys versions. This avoids potential data races as the dict keys versions are read without holding the dictionary's per-object lock in version guards.
Dicts keys objects are passed from keys version guards to the downstream uops. This ensures that we are loading from the correct offset in the keys object. Once a unicode key has been stored in a keys object for a combined dictionary in free-threaded builds, the offset that it is stored in will never be reused for a different key. Once the version guard passes, we know that we are reading from the correct offset.
The dictionary read fast-path is used to read values from the dictionary once we know the correct offset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants