-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Correct design.rst dictionaries section #11272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The section I propose to change implies that constant time means that all keys have to have different hash values--i.e. no collisions. This is not the case.
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
The original documentation is correct. If all keys have the same hash, retrieve a key will take linear time |
@serhiy-storchaka the original documentation says that if all keys have a different hash then retrieve is constant time. But it would still be constant time even if not all keys had a different hash. |
Doc/faq/design.rst
Outdated
@@ -499,8 +499,7 @@ using the :func:`hash` built-in function. The hash code varies widely depending | |||
on the key and a per-process seed; for example, "Python" could hash to | |||
-539294296 while "python", a string that differs by a single bit, could hash | |||
to 1142331976. The hash code is then used to calculate a location in an | |||
internal array where the value will be stored. Assuming that you're storing | |||
keys that all have different hash values, this means that dictionaries take | |||
internal array where the value will be stored. Hash tables, and therefore dictionaries, take | |||
constant time -- O(1), in Big-O notation -- to retrieve a key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sooner or later, someone will complain that the new docs are incorrect because dictionaries have O(n) worst case performance., so some sort of qualifier is still necessary. We get O(1) performance in the typical case when the hash values are evenly distributed; however, it is easy to construct degenerate cases with a catastrophic pile-ups at a single value.
Consider using the wikipedia wording where we mention both the average case and worst case.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Modified the section "How are Dictionaries implemented in CPython?" to mention a dictionary's average-case and worst-case time complexity.
I have made the requested changes; please review again |
Thanks for making the requested changes! @rhettinger: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the first line in this section CPython's dictionaries are implemented as resizable hash tables.
, please add a link to hash tables
to the wikipedia page that Raymond referenced. Thanks!
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
I have made the requested changes; please review again |
Thanks for making the requested changes! @csabella, @rhettinger: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
@viagluck, the CI builds are failing because of trailing whitespace. Please make sure you don't have any trailing whitespace on the lines you've added/changed. Thanks! |
After further review. I'm going to decline this change. The current wording is accurate and sticks with the essential point that if the hash keys are different (which is generally the case), the lookups at O(1). There proposed wording seems to emphasize a case that almost never occurs and would create unnecessary worry. |
The section I propose to change implies that constant time means that all keys have to have different hash values--i.e. no collisions. This is not the case.