Skip to content

Correct design.rst dictionaries section #11272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

viagluck
Copy link

The section I propose to change implies that constant time means that all keys have to have different hash values--i.e. no collisions. This is not the case.

The section I propose to change implies that constant time means that all keys have to have different hash values--i.e. no collisions. This is not the case.
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@serhiy-storchaka
Copy link
Member

The original documentation is correct. If all keys have the same hash, retrieve a key will take linear time

@viagluck
Copy link
Author

@serhiy-storchaka the original documentation says that if all keys have a different hash then retrieve is constant time. But it would still be constant time even if not all keys had a different hash.

@@ -499,8 +499,7 @@ using the :func:`hash` built-in function. The hash code varies widely depending
on the key and a per-process seed; for example, "Python" could hash to
-539294296 while "python", a string that differs by a single bit, could hash
to 1142331976. The hash code is then used to calculate a location in an
internal array where the value will be stored. Assuming that you're storing
keys that all have different hash values, this means that dictionaries take
internal array where the value will be stored. Hash tables, and therefore dictionaries, take
constant time -- O(1), in Big-O notation -- to retrieve a key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sooner or later, someone will complain that the new docs are incorrect because dictionaries have O(n) worst case performance., so some sort of qualifier is still necessary. We get O(1) performance in the typical case when the hash values are evenly distributed; however, it is easy to construct degenerate cases with a catastrophic pile-ups at a single value.

Consider using the wikipedia wording where we mention both the average case and worst case.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Modified the section "How are Dictionaries implemented in CPython?" to mention a dictionary's average-case and worst-case time complexity.
@viagluck
Copy link
Author

viagluck commented Jan 4, 2019

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@rhettinger: please review the changes made to this pull request.

Copy link
Contributor

@csabella csabella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the first line in this section CPython's dictionaries are implemented as resizable hash tables., please add a link to hash tables to the wikipedia page that Raymond referenced. Thanks!

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@viagluck
Copy link
Author

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@csabella, @rhettinger: please review the changes made to this pull request.

Copy link
Contributor

@csabella csabella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@csabella
Copy link
Contributor

@viagluck, the CI builds are failing because of trailing whitespace. Please make sure you don't have any trailing whitespace on the lines you've added/changed. Thanks!

@rhettinger
Copy link
Contributor

After further review. I'm going to decline this change. The current wording is accurate and sticks with the essential point that if the hash keys are different (which is generally the case), the lookups at O(1). There proposed wording seems to emphasize a case that almost never occurs and would create unnecessary worry.

@rhettinger rhettinger closed this Jun 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants