Skip to content

gh-118527: Intern code name and filename on default build #118576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 6, 2024

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented May 4, 2024

Interned and non-interned strings are treated differently by marshal, so be consistent between the default and free-threaded build.

Interned and non-interned strings are treated differently by `marshal`,
so be consistent between the default and free-threaded build.
@brandtbucher
Copy link
Member

brandtbucher commented May 4, 2024

So to clarify, the risk here is that the default build will leak a bunch of names if code is created dynamically (by using exec or whatever), each time using a different name, then freed?

How significant is the scaling improvement on the free-threading build?

@brandtbucher
Copy link
Member

Honestly, I'm not too worried since we already intern names and constants and stuff.

@markshannon
Copy link
Member

Do you have any numbers on how many extra strings this will intern?

@colesbury
Copy link
Contributor Author

colesbury commented May 4, 2024

How significant is the scaling improvement on the free-threading build?

It's critical for some common patterns, although it's not the only critical bottleneck (there are other remaining bottlenecks too).

For example:

def run_in_parallel():
    def foo():
        pass
    return foo

If run_in_parallel() is run from multiple threads concurrently, then the construction of different foo functions would be a bottleneck due to the functions taking their names from the code object. (The names are refcounted).

Do you have any numbers on how many extra strings this will intern?

Running test_threading (chosen arbitrarily) I see the number of interned unicode strings change:

before this PR: 19419
after this PR: 19721

(in the default build, as reported by sys.getunicodeinternedsize())

@markshannon
Copy link
Member

Running test_threading (chosen arbitrarily) I see the number of interned unicode strings change:

before this PR: 19419 after this PR: 19721

(in the default build, as reported by sys.getunicodeinternedsize())

The increase from 19419 to 19721 doesn't seem so bad, but it depends on the baseline.
As an extreme example, if 19418 of those are interned by python -c "pass" then it is an increase from 1 to 300 which is bad.

Can you get numbers for something like the mypy benchmark or a similar "realistic" program?

@colesbury
Copy link
Contributor Author

./python -m mypy --config-file Lib/test/libregrtest/mypy.ini
before: 25643
after: 25789

@colesbury
Copy link
Contributor Author

And ./python -c "pass"
before: 3926
after: 4070

@colesbury
Copy link
Contributor Author

I'm going to go ahead and merge this as the increase seems small, especially for larger programs.

@colesbury colesbury merged commit 2ba2c14 into python:main May 6, 2024
39 checks passed
@colesbury colesbury deleted the gh-118527-code branch May 6, 2024 21:24
SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024
…on#118576)

Interned and non-interned strings are treated differently by `marshal`,
so be consistent between the default and free-threaded build.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants