Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-131435: random.randint optimization #131436

Merged
merged 2 commits into from
Mar 20, 2025
Merged

Conversation

dg-pb
Copy link
Contributor

@dg-pb dg-pb commented Mar 18, 2025

    S="from random import randint; from itertools import starmap, repeat"
    $PYEXE -m timeit -s $S "list(starmap(randint, repeat((1, 2), 100_000)))"
    # Current: 50 ms
    # New:     40 ms  20% decrease

@tim-one
Copy link
Member

tim-one commented Mar 19, 2025

The code seems clearly correct, so what's to review? Intent? +0 from me - I never use randint, so can/'t say I care much 😉.

Offhand, I think randint() was a mistake to begin with, and so was allowing a non-+1 step for randrange(). "The right way" to get a random value from a general range is from a range object. Like so:

r = range(start, stop, step)
val = r[random._randbelow(len(r)]

It's even possible that changing the guts of randrange() to do that itself would be faster, because all the annoying special-case arithmetic would be done "at C speed" (by the range() implementation).

@dg-pb
Copy link
Contributor Author

dg-pb commented Mar 19, 2025

It's even possible that changing the guts of randrange() to do that itself would be faster, because all the annoying special-case arithmetic would be done "at C speed" (by the range() implementation).

    def randrange(self, start, stop=None, step=_ONE):
        if stop is None:
            stop = start
            start = 0
        rge = range(start, stop, step)
        return rge[self._randbelow(len(rge))]

Tried this quickly. range instantiation, len call and __getitem__ offsets gains and makes it slightly worse.

It would have value for iterator - random number producer.

@dg-pb
Copy link
Contributor Author

dg-pb commented Mar 19, 2025

I think randint() was a mistake to begin with

I go for randint from time to time. Probably it is because of the fact that numpy has it too and I am used to it.

I don't think I have ever used randrange though. And now as I know that I can do it more efficiently with range I don't think I will ever use it - I rarely need single random value.

@StanFromIreland
Copy link
Contributor

$ python3.14 -m timeit 'from random import randint; randint(0,10)'
1000000 loops, best of 5: 330 nsec per loop
$ ./python -m timeit 'from random import randint; randint(0,10)'
1000000 loops, best of 5: 288 nsec per loop

Only ~13% on a build without optimizations for me.

This would need a NEWS entry.

@dg-pb
Copy link
Contributor Author

dg-pb commented Mar 19, 2025

Only ~13% on a build without optimizations for me.

I did mine on build with optimizations (osx).

@dg-pb
Copy link
Contributor Author

dg-pb commented Mar 19, 2025

Only ~13% on a build without optimizations for me.

Also, try doing imports in setup, not as part of timing:

$PYEXE -m timeit -s 'from random import randint' 'randint(0,10)'
500000 loops, best of 5: 418 nsec per loop
1000000 loops, best of 5: 342 nsec per loop
342 / 418 = 82%

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure what to think of this. On one hand, a 10% performance improvement is great! On the other hand, most pure-Python optimizations are stale within a few years, because the interpreter changes. Shouldn't we leave optimizations to the interpreter itself?

So:

  • Do we have precedent for doing this in the standard library?
  • If so, how do we keep the cases limited enough to prevent copycat PRs that add silly micro-ops everywhere?

cc @picnixz (you've crunched through more PRs than I have 😄)

@dg-pb
Copy link
Contributor Author

dg-pb commented Mar 20, 2025

10%

20 :)

because the interpreter changes

It doesn't apply here. This is bypassing unnecessary code as opposed to replacing one pattern with another. Thus it will hold after changes of interpreter.

@ZeroIntensity
Copy link
Member

Ideally some magical future interpreter could do that dead-code elimination on the fly. But I see your point; maybe that kind of magic is too magic for anything soon ;)

Thinking out loud: seemingly, another option would be to add a fast-path to randrange that checks if step is _ONE to skip the extra __index__ call.

@dg-pb
Copy link
Contributor Author

dg-pb commented Mar 20, 2025

seemingly, another option would be to add a fast-path to randrange that checks if step is _ONE to skip the extra __index__ call.

Yup, could do that as well.

But even after that, it would still be minus self.randrange( call & if stop is None:.

Well, unless there would be randint deprecation and randrange was used directly, then only extra if is None, which is pretty much the same. But I don't know if that is good idea. As I said, randint exists in many places - numpy, pytorch, ... It might be good to keep it and have it a bit faster.


Do people use randrange with step=_ONE? If yes, could do fast-path as you said.

Github:

  • /random\.randint/ Language:Python - 1.6M files
  • /random\.randrange/ Language:Python - 193k files
  • /random\.randrange\(\d+,\d+,\d+\)/ Language:Python - 2.3k files
  • /random\.randrange\(.*step=.*\)/ Language:Python 277 files

So it seems that 99% of the time people just need randint without a step.
And 10% of the time use randrange instead.


I guess it is safe to say that these 2 are not going anywhere any time soon. Given these are minimal effort non-negligible improvement optimizations I think both are reasonable.


Also, just to put it in perspective. Some optimizations of up to 100% that are cheered upon are 100+ lines long. In comparison, if this was 100 lines long it would be 300%.

@rhettinger
Copy link
Contributor

In general, you almost always get a small speed-up by inlining a function call and duplicating code. In this case, it is only a small code expansion and is easy to read. I'll mark this as approved and apply it. But please don't take this a license to unfactor code elsewhere.

@rhettinger rhettinger merged commit c83efa7 into python:main Mar 20, 2025
39 checks passed
@dg-pb dg-pb deleted the random_randint_work branch March 20, 2025 23:01
@picnixz
Copy link
Member

picnixz commented Mar 22, 2025

@ZeroIntensity

you've crunched through more PRs than I have 😄

IMO it's always a matter of who the maintainer is and how much we gain. I've recently refactored uuid constructors because I was able to gain a 30-50% speedup without adding code complexity (heck I think I even simplified it!). OTOH, I wrote a fnmatch C version to gain a 7x speedup for translate() vs a 3x speedup with a pure Python version but I decided to just keep the Python version as the maintenance cost would have been 1000x more otherwise.

In the first case, I removed some redundant checks by adding dedicated factories. In the latter, it was just a smarter algorithm. I don't think I've ever inlined functions or added fast paths like that because I'm never sure about the usage ratio nor whether this could be a future antipattern.

So, I would say, the improvements brought by such PRs should definitely be significant, not just small (usually small is about 10%), should be easy to read (no convoluted code, no code expansion ending up doubling the number of "effective" lines), and ideally should be discussed before opening a PR. One way to do it is for instance to gather a bunch of possible improvements and present PoCs on dpo and then we could selectively choose which ones to apply instead of opening lots of issues that could be closed the day after or directly rejected.

Personally I'm used to maintain and read unreadable code (teaching experience you know...) but I wouldn't accept optimizations to hashlib if those could hide some security holes that we could have missed because of corner cases. So such optimizations should be carefully reviewed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants