-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-131435: random.randint optimization #131436
Conversation
dg-pb
commented
Mar 18, 2025
•
edited
Loading
edited
- Issue: random.randint performance improvement #131435
The code seems clearly correct, so what's to review? Intent? +0 from me - I never use Offhand, I think r = range(start, stop, step)
val = r[random._randbelow(len(r)] It's even possible that changing the guts of |
def randrange(self, start, stop=None, step=_ONE):
if stop is None:
stop = start
start = 0
rge = range(start, stop, step)
return rge[self._randbelow(len(rge))] Tried this quickly. It would have value for iterator - random number producer. |
I go for I don't think I have ever used |
Only ~13% on a build without optimizations for me. This would need a NEWS entry. |
I did mine on build with optimizations (osx). |
Also, try doing imports in setup, not as part of timing: $PYEXE -m timeit -s 'from random import randint' 'randint(0,10)'
500000 loops, best of 5: 418 nsec per loop
1000000 loops, best of 5: 342 nsec per loop
342 / 418 = 82% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure what to think of this. On one hand, a 10% performance improvement is great! On the other hand, most pure-Python optimizations are stale within a few years, because the interpreter changes. Shouldn't we leave optimizations to the interpreter itself?
So:
- Do we have precedent for doing this in the standard library?
- If so, how do we keep the cases limited enough to prevent copycat PRs that add silly micro-ops everywhere?
cc @picnixz (you've crunched through more PRs than I have 😄)
20 :)
It doesn't apply here. This is bypassing unnecessary code as opposed to replacing one pattern with another. Thus it will hold after changes of interpreter. |
Ideally some magical future interpreter could do that dead-code elimination on the fly. But I see your point; maybe that kind of magic is too magic for anything soon ;) Thinking out loud: seemingly, another option would be to add a fast-path to |
Yup, could do that as well. But even after that, it would still be minus Well, unless there would be Do people use Github:
So it seems that 99% of the time people just need I guess it is safe to say that these 2 are not going anywhere any time soon. Given these are minimal effort non-negligible improvement optimizations I think both are reasonable. Also, just to put it in perspective. Some optimizations of up to 100% that are cheered upon are 100+ lines long. In comparison, if this was 100 lines long it would be 300%. |
In general, you almost always get a small speed-up by inlining a function call and duplicating code. In this case, it is only a small code expansion and is easy to read. I'll mark this as approved and apply it. But please don't take this a license to unfactor code elsewhere. |
IMO it's always a matter of who the maintainer is and how much we gain. I've recently refactored uuid constructors because I was able to gain a 30-50% speedup without adding code complexity (heck I think I even simplified it!). OTOH, I wrote a fnmatch C version to gain a 7x speedup for translate() vs a 3x speedup with a pure Python version but I decided to just keep the Python version as the maintenance cost would have been 1000x more otherwise. In the first case, I removed some redundant checks by adding dedicated factories. In the latter, it was just a smarter algorithm. I don't think I've ever inlined functions or added fast paths like that because I'm never sure about the usage ratio nor whether this could be a future antipattern. So, I would say, the improvements brought by such PRs should definitely be significant, not just small (usually small is about 10%), should be easy to read (no convoluted code, no code expansion ending up doubling the number of "effective" lines), and ideally should be discussed before opening a PR. One way to do it is for instance to gather a bunch of possible improvements and present PoCs on dpo and then we could selectively choose which ones to apply instead of opening lots of issues that could be closed the day after or directly rejected. Personally I'm used to maintain and read unreadable code (teaching experience you know...) but I wouldn't accept optimizations to hashlib if those could hide some security holes that we could have missed because of corner cases. So such optimizations should be carefully reviewed. |