Skip to content

Commit 31d5deb

Browse files
committedMar 21, 2019
fix hashmap docs and code
1 parent 5d25f04 commit 31d5deb

File tree

10 files changed

+84
-989
lines changed

10 files changed

+84
-989
lines changed
 

‎book/book.adoc

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,16 @@ endif::[]
2424
:pdf-stylesdir: ./_resources/pdfstyles
2525
:pdf-style: adrian-screen
2626
:title-logo-image: image:logo.png[Logo,50,50]
27-
2827
// custom variables
2928
:codedir: ../../src
3029
:datadir: {docdir}/data
3130
:source-highlighter: pygments
3231
:pygments-style: xcode
32+
:stem:
33+
:plantuml-config: {docdir}/_conf/umlconfig.txt
34+
// :hide-uri-scheme:
35+
// :chapter-label: Chapter
36+
// :appendix-caption: Appendix
3337
// :chapter-label:
3438

3539

‎book/chapters/map-hashmap.adoc

Lines changed: 47 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ A HashMap is a Map implementation. HashMaps are composed of two things:
44
1) a hash function and
55
2) a bucket array to store values.
66

7-
Before going into the implementation details let’s give an overview of how it works. Let’s say we want to keep a tally of things:
7+
Before going into the implementation details let’s give an overview of how it works. Let’s say we want to keep a tally of things and animals:
88

99
.HashMap example
1010
[source, javascript]
@@ -15,7 +15,7 @@ include::{codedir}/data-structures/maps/hash-maps/hash-map.js[tag=snippet, inden
1515
How are the keys mapped to their values?
1616
Using a hash function. Here’s an illustration:
1717

18-
.HashMap representation. Keys are mapped to values using a hash function.
18+
.Internal HashMap representation
1919
image:image41.png[image,width=528,height=299]
2020

2121

@@ -74,7 +74,7 @@ include::{codedir}/data-structures/maps/hash-maps/hashing.js[tag=naiveHashCodeEx
7474

7575
Notice that `rat` and `art` have the same hash code! These are collisions that we need to solve.
7676

77-
Collisions happened because we are just summing the character's codes and are not taking the order into account nor the type. We can do better by offsetting the character value based on their position in the string. We can also add the object type, so number `10` produce different output than string `'10'`.
77+
Collisions happened because we are adding the letter's unicode and are not taking the order into account nor the type. We can do better by offsetting the character value based on their position in the string. We can also add the object type, so number `10` produce different output than string `'10'`.
7878

7979
.Hashing function implementation that offset character value based on the position
8080
[source, javascript]
@@ -86,20 +86,32 @@ Since Unicode uses 20 bits, we can offset each character by 20 bits based on the
8686

8787
.JavaScript built-in `BigInt`
8888
****
89-
BigInt allows to operate beyond the maximum safe limit of integers (Number.MAX_SAFE_INTEGER => 9,007,199,254,740,991). BigInt uses the suffix n, e.g. 1n + 3n === 4n.
89+
BigInt allows to operate beyond the maximum safe limit of integers.
90+
91+
[source, javascript]
92+
----
93+
Number.MAX_SAFE_INTEGER // => 9,007,199,254,740,991
94+
----
95+
96+
BigInt has no virtually limits (until you run out of memory). It uses the suffix `n`.
97+
98+
[source, javascript]
99+
----
100+
1n + 3n === 4n
101+
----
90102
****
91103

92-
As you can imagine the output is a humongous number! We are using `BigInt` that doesn’t overflow.
104+
As you can imagine, summing 20bits per letter leads to a humongous number! That's the case even for 3 letters words. We are using `BigInt` so it doesn’t overflow.
93105

94106
.Verifying there's not hashing code duplicates
95107
[source, javascript]
96108
----
97109
include::{codedir}/data-structures/maps/hash-maps/hashing.js[tag=hashCodeOffsetExample, indent=0]
98110
----
99111

100-
As you can see We don’t have duplicates if the keys have different content or type. However, we need to represent these unbounded integers. We do that using *compression function* they can be as simple as `% BUCKET_SIZE`.
112+
We don’t have duplicates anymore! If the keys have different content or type they have a different hash code. However, we need to represent these unbounded integers to finite buckets in an array. We do that using *compression function*. This function can be as simple as `% BUCKET_SIZE`.
101113

102-
However, there’s an issue with the last implementation. It doesn’t matter how humongous is the number if we at the end use the modulus to get an array index. The part of the hash code that truly matters is the last bits.
114+
However, there’s an issue with the last implementation. It doesn’t matter how big (and different) is the hash code number if we at the end use the modulus to get an array index. The part of the hash code that truly matters is the last bits.
103115

104116
.Look at this example with a bucket size of 4.
105117
[source, javascript]
@@ -111,9 +123,9 @@ However, there’s an issue with the last implementation. It doesn’t matter ho
111123
50 % 4 //↪️ 2
112124
----
113125

114-
We get many collisions. [big]#😱#
126+
All the hash codes are different and still we get many collisions! [big]#😱#
115127

116-
Based on statistical data, using a prime number as the modulus produce fewer collisions.
128+
Based on numbers properties, using a prime number as the modulus produce fewer collisions.
117129

118130
.Let’s see what happens if the bucket size is a prime number:
119131
[source, javascript]
@@ -135,6 +147,14 @@ Let’s design a better HashMap with what we learned.
135147

136148
=== Implementing an optimized hash function
137149

150+
We are going to use a battle tested non-cryptographic hash function called FNV Hash.
151+
152+
.FNV (Fowler/Noll/Vo) Hash
153+
****
154+
It is a non-cryptographic hash function designed to be fast while maintaining a low collision rate. The high dispersion of the FNV hashes makes them well suited for hashing nearly identical strings such as URLs, keys, IP addresses, zip codes, and others.
155+
****
156+
157+
138158
Take a look at the following function:
139159

140160
.Optimal Hash function
@@ -145,7 +165,7 @@ include::{codedir}/data-structures/maps/hash-maps/hash-map.js[tag=hashFunction,
145165

146166
Is somewhat similar to what we did before, in the sense that we use each letter’s Unicode is used to compute the hash. The difference is:
147167

148-
1. We are using the XOR bitwise operation (^) to produce an *avalanche effect*, where a small change in two strings produces completely different hash codes. E.g.
168+
1. We are using the XOR bitwise operation (`^`) to produce an *avalanche effect*, where a small change in two strings produces completely different hash codes. E.g.
149169

150170
.Hash Code example using FVN1a
151171
[source, javascript]
@@ -154,20 +174,17 @@ hashCode('cat') //↪️ 4201630708
154174
hashCode('cats') //↪️ 3304940933
155175
----
156176

157-
.Fowler/Noll/Vo (FNV) Hash
158-
****
159-
It is a non-cryptographic hash function designed to be fast while maintaining a low collision rate. The high dispersion of the FNV hashes makes them well suited for hashing nearly identical strings such as URLs, keys, IP addresses, zip codes, and others.
160-
****
177+
A one letter change produce a totally different output.
161178

162-
We are using the FVN-1a prime number (16777619) and offset (2166136261) to reduce collisions even further. If you are curious where these numbers come from check out this https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function[link].
179+
We are using the FVN-1a prime number (`16777619`) and the offset (`2166136261`) to reduce collisions even further. If you are curious where these numbers come from check out this http://bit.ly/fvn-1a[link].
163180

164181
FVN-1a hash function is a good trade-off between speed and collision prevention.
165182

166183
Now that we have a proper hash function. Let’s move on with the rest of the HashMap implementation.
167184

168185
== Implementing a HashMap in JavaScript
169186

170-
Let’s start by creating a class and its constructor to initialize the hash map. We are going to have an array called *buckets* to hold all the data as below:
187+
Let’s start by creating a class and its constructor to initialize the hash map. We are going to have an array called `buckets` to hold all the data.
171188

172189
.HashMap's constructor
173190
[source, javascript]
@@ -183,17 +200,11 @@ include::{codedir}/data-structures/maps/hash-maps/hash-map.js[tag=getLoadFactor,
183200
}
184201
----
185202

186-
Notice that we are also keeping track of collisions (just for benchmarking purposes) and a load factor. *The load factor* measures how full the hash map is. We don’t want to be fuller than 75%. If the HashMap is getting too full, then we are going to fix it doing a *rehash* (more on that later).
203+
Notice that we are also keeping track of collisions (for benchmarking purposes) and a load factor. *The load factor* measures how full the hash map is. We don’t want to be fuller than 75%. If the HashMap is getting too full, then we are going to fix it doing a *rehash* (more on that later).
187204

188205
=== Inserting elements in a HashMap
189206

190-
To insert values into a HashMap, we first convert the *key* into *an array index* using the hash function. Each bucket of the array will have an object `{key, value}`.
191-
192-
There are multiple scenarios for inserting key/values in a HashMap:
193-
194-
1. Key doesn’t exist yet, so we create the new key/value pair.
195-
2. Key already exists, then we will replace the value.
196-
3. Key doesn’t exist, but the bucket already has other data, this is a collision! We push the new element to the bucket.
207+
To insert values into a HashMap, we first convert the *key* into an *array index* using the hash and compression function. Each bucket of the array will have an object with the shape of `{key, value}`.
197208

198209
In code, it looks like this:
199210

@@ -202,12 +213,17 @@ In code, it looks like this:
202213
----
203214
include::{codedir}/data-structures/maps/hash-maps/hash-map.js[tag=set, indent=0]
204215
----
216+
// There are multiple scenarios for inserting key/values in a HashMap:
217+
<1> Key doesn’t exist yet, so we create the new key/value pair.
218+
<2> Key already exists, then we will replace the value.
219+
<3> Key doesn’t exist, but the bucket already has other data, this is a collision! We push the new element to the bucket.
220+
<4> To keep insertion order, we keep track of the order of the keys using `keysTrackerArray` and `keysTrackerIndex`.
205221

206222
Notice, that we are using a function called `getEntry` to check if the key already exists. It gets the index of the bucket corresponding to the key and then checks if the entry with the given key exists. We are going to implement this function in a bit.
207223

208224
=== Getting values out of a HashMap
209225

210-
For getting values out of the Map, we do something similar to inserting. We convert the key into an index using the hash function.
226+
For getting values out of the Map, we do something similar to inserting. We convert the key into an `index` using the hash function, then we that `index` we look for the value in the bucket.
211227

212228
.HashMap's getEntry method
213229
[source, javascript]
@@ -217,7 +233,7 @@ include::{codedir}/data-structures/maps/hash-maps/hash-map.js[tag=getEntry, inde
217233
<1> Convert key to an array index.
218234
<2> If the bucket is empty create a new linked list
219235
<3> Use Linked list's <<Searching by value>> method to find value on the bucket.
220-
<4> Return bucket and entry if found.
236+
<4> Return `bucket` and `entry` if found.
221237

222238
With the help of the `getEntry` method, we can do the `HashMap.get` and `HashMap.has` methods:
223239

@@ -239,7 +255,7 @@ For `HashMap.has` we only care if the value exists or not, while that for `HashM
239255

240256
=== Deleting from a HashMap
241257

242-
Removing items from a HashMap is not too different from what we did before:
258+
Removing items from a HashMap is not too different from what we did before.
243259

244260
.HashMap's delete method
245261
[source, javascript]
@@ -251,9 +267,9 @@ If the bucket doesn’t exist or is empty, we don't have to do anything else. If
251267
https://github.com/amejiarosario/dsa.js/blob/7694c20d13f6c53457ee24fbdfd3c0ac57139ff4/src/data-structures/linked-lists/linked-list.js#L218[`LinkedList.remove` ]
252268
method.
253269

254-
== Rehashing the HashMap
270+
== Rehashing a HashMap
255271

256-
Rehashing is a technique to minimize collisions when a hash map is getting full. It doubles the size of the map and recomputes all the hash codes and insert data in the new bucket.
272+
Rehashing is a technique to minimize collisions when a hash map is getting full. It doubles the size of the map and recomputes all the hash codes and insert data in the new buckets.
257273

258274
When we increase the map size, we try to find the next prime. We explained that keeping the bucket size a prime number is beneficial for minimizing collisions.
259275

@@ -264,7 +280,7 @@ include::{codedir}/data-structures/maps/hash-maps/hash-map.js[tag=rehash, indent
264280
----
265281

266282
In the
267-
https://github.com/amejiarosario/dsa.js/blob/master/src/data-structures/hash-maps/primes.js[prime.js] file you can find the implementation for finding the next prime. Also, you can see the full HashMap implementation on this file: https://github.com/amejiarosario/dsa.js/blob/master/src/data-structures/hash-maps/hashmap.js[hashmap.js]
283+
https://github.com/amejiarosario/dsa.js/blob/f69b744a1bddd3d99243ca64b3ad46f3f2dd7342/src/data-structures/maps/hash-maps/primes.js[prime.js] file you can find the implementation for finding the next prime. Also, you can see the full HashMap implementation on this file: https://github.com/amejiarosario/dsa.js/blob/f69b744a1bddd3d99243ca64b3ad46f3f2dd7342/src/data-structures/maps/hash-maps/hash-map.js#L1[hashmap.js]
268284

269285
== HashMap time complexity
270286

@@ -279,4 +295,4 @@ Hash Map it’s very optimal for searching values by key in constant time *O(1)*
279295
|===
280296
{empty}* = Amortized run time. E.g. rehashing might affect run time.
281297

282-
As you can notice we have amortized times since, in the unfortunate case of a rehash, it will take O(n) while it resizes. After that, it will be on average *O(1)*.
298+
As you can notice we have amortized times since, in the unfortunate case of a rehash, it will take O(n) while it resizes. After that, it will be *O(1)*.

‎book/chapters/map-intro.adoc

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@ Many languages have maps already built-in. JavaScript/Node has `Map`:
1212
include::{codedir}/data-structures/maps/map.js[tag=snippet, indent=0]
1313
----
1414

15+
In short, you set `key`/`value` pair and then you can get the `value` using the `key`.
16+
1517
The attractive part of Maps is that they are very performant usually *O(1)* or *O(log n)* depending on the implementation. We can implement the maps using two different techniques:
1618

17-
* *HashMap*: it’s a map implementation using an *array* and *hash function*. The job of the hash function is to convert the key into an index that contains the matching data. Optimized HashMap can have an average runtime of *O(1)*.
18-
* *TreeMap*: it’s a map implementation that uses a self-balanced Binary Search Tree (red-black tree). The BST nodes store the key, and the value and nodes are sorted by key guaranteeing an *O(log n)* look up.
19+
* *HashMap*: it’s a map implementation using an *array* and a *hash function*. The job of the hash function is to convert the `key` into an index that maps to the `value`. Optimized HashMap can have an average runtime of *O(1)*.
20+
* *TreeMap*: it’s a map implementation that uses a self-balanced Binary Search Tree (like <<AVL Tree>>). The BST nodes store the key, and the value and nodes are sorted by key guaranteeing an *O(log n)* look up.

‎book/chapters/map-treemap.adoc

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,7 @@ For inserting a value on a TreeMap, we first need to inialize the tree:
3636
include::{codedir}/data-structures/maps/tree-maps/tree-map.js[tag=constructor, indent=0]
3737
----
3838

39-
The tree can be an instance of any Binary Search Tree that we implemented so far. However, for better performance, it should be a self-balanced tree like a https://github.com/amejiarosario/dsa.js/blob/master/src/data-structures/trees/red-black-tree.js[Red-Black Tree] or https://github.com/amejiarosario/dsa.js/blob/master/src/data-structures/trees/avl-tree.js[AVL Tree].
40-
39+
The tree can be an instance of any Binary Search Tree that we implemented so far. However, for better performance, it should be a self-balanced tree like a https://github.com/amejiarosario/dsa.js/blob/f69b744a1bddd3d99243ca64b3ad46f3f2dd7342/src/data-structures/trees/red-black-tree.js[Red-Black Tree] or https://github.com/amejiarosario/dsa.js/blob/f69b744a1bddd3d99243ca64b3ad46f3f2dd7342/src/data-structures/trees/avl-tree.js[AVL Tree].
4140

4241
Let's implement the method to add values to the tree.
4342

@@ -85,4 +84,4 @@ include::{codedir}/data-structures/maps/tree-maps/tree-map.js[tag=delete, indent
8584

8685
The BST implementation does all the heavy lifting.
8786

88-
That’s it! To see the full file in context, click here: https://github.com/amejiarosario/dsa.js/blob/master/src/data-structures/maps/tree-maps/tree-map.js[here]
87+
That’s it! To see the full file in context, click here: https://github.com/amejiarosario/dsa.js/blob/f69b744a1bddd3d99243ca64b3ad46f3f2dd7342/src/data-structures/maps/tree-maps/tree-map.js[here]

‎book/chapters/map.adoc

Lines changed: 0 additions & 411 deletions
This file was deleted.

‎book/chapters/output.adoc

Lines changed: 0 additions & 515 deletions
This file was deleted.

‎book/chapters/output.adoc.zip

-34.6 KB
Binary file not shown.

‎src/data-structures/maps/hash-maps/hash-map.js

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ const LinkedList = require('../../linked-lists/linked-list');
33
const { nextPrime } = require('./primes');
44

55
/**
6-
* The Map object holds key-value pairs.
6+
* The map holds key-value pairs.
77
* Any value (both objects and primitive values) may be used as either a key or a value.
88
*
99
* Features:
@@ -16,9 +16,8 @@ class HashMap {
1616
// tag::constructorPartial[]
1717
/**
1818
* Initialize array that holds the values.
19-
* @param {number} initialCapacity initial size of the array (should be a prime)
20-
* @param {number} loadFactor if set, the Map will automatically
21-
* rehash when the load factor threshold is met
19+
* @param {number} initialCapacity initial size of the array (preferably a prime)
20+
* @param {number} loadFactor rehash is called when this threshold is met.
2221
*/
2322
constructor(initialCapacity = 19, loadFactor = 0.75) {
2423
this.initialCapacity = initialCapacity;
@@ -46,7 +45,7 @@ class HashMap {
4645
/**
4746
* Polynomial hash codes are used to hash String typed keys.
4847
* It uses FVN-1a hashing algorithm for 32 bits
49-
* @see https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
48+
* @see http://bit.ly/fvn-1a
5049
* @param {any} key
5150
* @return {integer} bucket index
5251
*/
@@ -70,11 +69,11 @@ class HashMap {
7069
* containing key/value objects.
7170
*
7271
* Avg. Runtime: O(1)
73-
* Usually O(1) but there are many collisions it could be O(n).
72+
* Usually O(1) but if there are many collisions it could be O(n).
7473
*
7574
* @param {any} key
76-
* @returns {object} object containing the bucket and
77-
* entry (LinkedList's node matching value)
75+
* @returns {object} object `{ bucket, entry }` containing the bucket
76+
* and entry (LinkedList's node matching value)
7877
*/
7978
getEntry(key) {
8079
const index = this.hashFunction(key); // <1>
@@ -97,25 +96,24 @@ class HashMap {
9796
/**
9897
* Insert a key/value pair into the hash map.
9998
* If the key is already there replaces its content.
100-
* Avg. Runtime: O(1)
101-
* In the case a rehash is needed O(n).
99+
* Avg. Runtime: O(1). In the case a rehash is needed O(n).
102100
* @param {any} key
103101
* @param {any} value
104-
* @returns {HashMap} Return the Map object to allow chaining
102+
* @returns {HashMap} Return the map to allow chaining
105103
*/
106104
set(key, value) {
107105
const { entry: exists, bucket } = this.getEntry(key);
108106

109-
if (!exists) { // add key/value if it doesn't find the key
107+
if (!exists) { // key/value doesn't exist <1>
110108
bucket.push({ key, value, order: this.keysTrackerIndex });
111-
this.keysTrackerArray[this.keysTrackerIndex] = key;
109+
this.keysTrackerArray[this.keysTrackerIndex] = key; // <4>
112110
this.keysTrackerIndex += 1;
113111
this.size += 1;
114-
if (bucket.size > 1) { this.collisions += 1; }
112+
if (bucket.size > 1) { this.collisions += 1; } // <3>
115113
if (this.isBeyondloadFactor()) { this.rehash(); }
116114
} else {
117115
// update value if key already exists
118-
exists.value = value;
116+
exists.value = value; // <2>
119117
}
120118
return this;
121119
}
@@ -150,10 +148,10 @@ class HashMap {
150148

151149
// tag::delete[]
152150
/**
153-
* Removes the specified element from a Map object.
151+
* Removes the specified element from the map.
154152
* Avg. Runtime: O(1)
155153
* @param {*} key
156-
* @returns {boolean} true if an element in the Map object existed
154+
* @returns {boolean} true if an element in the map existed
157155
* and has been removed, or false if the element did not exist.
158156
*/
159157
delete(key) {
@@ -193,20 +191,22 @@ class HashMap {
193191
// tag::rehash[]
194192
/**
195193
* Rehash means to create a new Map with a new (higher)
196-
* capacity with the purpose of outgrow collisions.
194+
* capacity with the purpose of outgrowing collisions.
197195
* @param {integer} newBucketSize new bucket size by default
198196
* is the 2x the amount of data or bucket size.
199197
*/
200198
rehash(newBucketSize = Math.max(this.size, this.buckets.length) * 2) {
201199
const newCapacity = nextPrime(newBucketSize);
202200
const newMap = new HashMap(newCapacity);
203201

202+
// copy all values to the new map
204203
for (const key of this.keys()) {
205204
newMap.set(key, this.get(key));
206205
}
207206

208207
const newArrayKeys = Array.from(newMap.keys());
209208

209+
// override this map with the newMap
210210
this.reset(
211211
newMap.buckets,
212212
newMap.size,
@@ -219,7 +219,7 @@ class HashMap {
219219

220220

221221
/**
222-
* Keys for each element in the Map object in insertion order.
222+
* Keys for each element in the map in insertion order.
223223
* @returns {Iterator} keys without holes (empty spaces of deleted keys)
224224
*/
225225
* keys() {
@@ -232,7 +232,7 @@ class HashMap {
232232
}
233233

234234
/**
235-
* Values for each element in the Map object in insertion order.
235+
* Values for each element in the map in insertion order.
236236
* @returns {Iterator} values without holes (empty spaces of deleted values)
237237
*/
238238
* values() {
@@ -242,7 +242,7 @@ class HashMap {
242242
}
243243

244244
/**
245-
* Contains the [key, value] pairs for each element in the Map object in insertion order.
245+
* Contains the [key, value] pairs for each element in the map in insertion order.
246246
* @returns {Iterator}
247247
*/
248248
* entries() {

‎src/data-structures/maps/hash-maps/hashing.js

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
// tag::naiveHashCode[]
22
/**
33
* Naïve implementation of a non-cryptographic hashing function
4-
* @param {any} key key to be converted to a positive integer
4+
* @param {any} key to be converted to a positive integer
55
* @returns {integer} hash code (numeric representation of the key)
66
*/
77
function hashCodeNaive(key) {
@@ -25,7 +25,7 @@ hashCode(10); //=> 97 ('1'=49 + '0'=48)
2525
/**
2626
* Calculates hash code that maps a key (value) to an integer (unbounded).
2727
* It uses a 20 bit offset to avoid Unicode value overlaps
28-
* @param {any} key key to be converted to a positive integer
28+
* @param {any} key to be converted to a positive integer
2929
* @returns {BigInt} returns big integer (unbounded) that maps to the key
3030
*/
3131
function hashCode(key) {

‎src/data-structures/maps/map.js

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
const myMap = new Map();
44
55
// mapping values to keys
6-
myMap.set('string', 'foo');
7-
myMap.set(1, 'bar');
8-
myMap.set({}, 'baz');
6+
myMap.set('string', 'foo'); // string as key
7+
myMap.set(1, 'bar'); // number as key
8+
myMap.set({}, 'baz'); // object as key
99
const obj1 = {};
1010
myMap.set(obj1, 'test');
1111

0 commit comments

Comments
 (0)
Please sign in to comment.