You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The attractive part of Maps is that they are very performant usually *O(1)* or *O(log n)* depending on the implementation. We can implement the maps using two different techniques:
16
+
17
+
* *HashMap*: it’s a map implementation using an *array* and *hash function*. The job of the hash function is to convert the key into an index that contains the matching data. Optimized HashMap can have an average runtime of *O(1)*.
18
+
* *TreeMap*: it’s a map implementation that uses a self-balanced Binary Search Tree (red-black tree). The BST nodes store the key and the value and nodes are sorted by key to guarantee an *O(log n)* look up.
19
+
20
+
== HashMap vs TreeMap
21
+
22
+
Here are the key differences:
23
+
24
+
* HashMap is more time-efficient. A TreeMap is more space-efficient.
25
+
* TreeMap search complexity is *O(log n)*, while an optimized HashMap is *O(1)* on average.
26
+
* HashMap’s keys are in insertion order (or random in some implementations). TreeMap’s keys are always sorted.
27
+
* TreeMap offers some statistical data for free such as: get minimum, get maximum, median, find ranges of keys. HashMap doesn’t.
28
+
* TreeMap has a guarantee always a *O(log n)*, while HashMaps has a amortized time of *O(1)* but in the rare case of a rehash it would take a *O(n)*.
29
+
30
+
== Learning how hash maps work
31
+
32
+
A HashMap is composed of two things: 1) a hash function and 2) a bucket array to store values. Before going into the implementation details let’s give an overview how it works. Let’s say we want to keep a tally of things:
How are keys map to their values? Here’s an illustration:
41
+
42
+
.HashMap representation. Keys are mapped to values using a hash function.
43
+
image:image41.png[image,width=528,height=299]
44
+
45
+
46
+
.This is the main idea:
47
+
1. We use a *hash function* to transform the keys (e.g. dog, cat, rat, …) into an array index. This array is called *bucket*.
48
+
2. The bucket holds the values (linked list in case of collisions).
49
+
50
+
In the illustration, we have a bucket size of 10. In the bucket 0, we have a collision. Both, cat and art keys are mapped to the same bucket even thought their hash codes are different.
51
+
52
+
In a HashMap, a *collision* is when different keys are mapped to the same index. They are bad for performance since it can reduce the search time from *O(1)* to *O(n)*.
53
+
54
+
Having a big bucket size can avoid collision but also can waste too much memory. We are going to build an _optimized_ HashMap that re-sizes itself when is getting full. This avoid collisions and doesn’t waste too much memory upfront. Let’s start with the hash function.
55
+
56
+
=== Designing an optimized hash function
57
+
58
+
In order to minimize collisions, we need to create a great hash function.
59
+
60
+
A *perfect* hash function is one that assign a unique array index for every different key.
61
+
62
+
It’s hard and memory-wise wasteful to have a perfect has function so we are going to shot for a great hash function. To recap:
63
+
64
+
A hash function converts keys into array indices.
65
+
66
+
A hash function is composed of two parts:
67
+
68
+
1. *Hash Code*: maps any key into an integer (unbonded)
69
+
2. *Compression function*: maps an arbitrary integer to integer in the range of [0… BUCKET_SIZE -1].
70
+
71
+
==== Analysing collisions on bad hash code functions
72
+
73
+
The goal of a hash code function is to convert any value given into a positive integer. A common way to accomplish with summing each string’s Unicode value.
This function uses codePointAt to get the Unicode value. E.g. a has a value of 97, A is 65, even https://en.wikipedia.org/wiki/Emoji#Unicode_blocks[emojis have codes]; “[big]#😁#” is `128513`.
83
+
84
+
.JavaScript built-in `string.charCodeAt` and `string.codePointAt`
85
+
****
86
+
The `charCodeAt()` method returns an integer between `0` and `65535` representing the UTF-16 code unit at the given index. However, it doesn’t play nice with Unicode, so it’s better to use `codePointAt` instead.
87
+
88
+
The `codePointAt()` method returns a non-negative integer that is the Unicode code point value.
89
+
****
90
+
With this function we have the can convert some keys to numbers as follows:
Notice that rat and art have the same hash code! These are collisions that we need to solve.
99
+
100
+
This happened because we just summing the character's codes and are not taking the order into account nor the type. We can do better by offsetting the character value based on their position in the string and appending the type into the calculation.
101
+
102
+
.Hashing function implementation that offset character value based on the position
Since Unicode uses 20 bits, we can offset each character by 20 bits based on the position.
109
+
110
+
.JavaScript built-in `BigInt`
111
+
****
112
+
BigInt allows to operate beyond the maximum safe limit of integers (Number.MAX_SAFE_INTEGER => 9,007,199,254,740,991). BigInt uses the suffix n, e.g. 1n + 3n === 4n.
113
+
****
114
+
115
+
As you can imagine the output is a humongous number! We are using `BigInt` that doesn’t overflow.
We don’t have duplicates if the keys have different content or type. However, we need to represent these unbounded integers. We do that using *compression function* they can be as simple as `% BUCKET_SIZE`.
124
+
125
+
However, there’s an issue with the last implementation. It doesn’t matter how big is the number (we are using BigInt), if we at the end use the modulus to get an array index, then the part of the number that truly matters is the last bits. Also, the modulus itself is much better if is a prime number.
126
+
127
+
.Look at this example with a bucket size of 4.
128
+
[source, javascript]
129
+
----
130
+
10 % 4 //↪️ 2
131
+
20 % 4 //↪️ 0
132
+
30 % 4 //↪️ 2
133
+
40 % 4 //↪️ 0
134
+
50 % 4 //↪️ 2
135
+
----
136
+
137
+
We get many collisions. 😱
138
+
139
+
.Let’s see what happens if the bucket size is a prime number:
140
+
[source, javascript]
141
+
----
142
+
10 % 7 //↪️ 3
143
+
20 % 7 //↪️ 6
144
+
30 % 7 //↪️ 2
145
+
40 % 7 //↪️ 4
146
+
50 % 7 //↪️ 1
147
+
----
148
+
149
+
Now it’s more evenly distributed!! [big]#😎👍#
150
+
151
+
.So, to sum up:
152
+
* Bucket size should always be a *prime number* so data is distributed more evenly and minimized collisions.
153
+
* Hash code doesn’t have to be too big. At the end what matters is the few last digits.
154
+
155
+
Let’s design a better HashMap with what we learned.
Is somewhat similar to what we did before, in the sense that we use each letter’s Unicode is used to compute the hash. The difference is:
168
+
169
+
1. We are using a the XOR bitwise operation (^) to produce an *avalanche effect*, where a small change in two strings produces completely different hash codes. E.g.
170
+
171
+
.Hash Code example using FVN1a
172
+
[source, javascript]
173
+
----
174
+
hashCode('cat') //↪️ 4201630708
175
+
hashCode('cats') //↪️ 3304940933
176
+
----
177
+
178
+
.Fowler/Noll/Vo (FNV) Hash
179
+
****
180
+
It is a non-cryptographic hash function designed to be fast while maintaining low collision rate. The high dispersion of the FNV hashes makes them well suited for hashing nearly identical strings such as URLs, keys, IP addresses, zip codes and others.
181
+
****
182
+
183
+
1. We are using FVN-1a prime numbers and offset to reduce collisions even further. Check the https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function[link] to see where this prime numbers and offsets come from.
184
+
185
+
This hash function is a good trade-off between speed and collision prevention.
186
+
187
+
Now that we have a good hash function. Let’s move on with the rest of the HashMap implementation.
188
+
189
+
== Implementing a HashMap in JavaScript
190
+
191
+
Let’s start by creating a class and its constructor to initialize the hash map. We want an array called *buckets* to hold all the data.
Notice that we are also keeping track of collisions (just for benchmarking purposes) and a load factor. *The load factor* measures how full the hash map is. We don’t want to be fuller than the 75%. After that we are going to do something called *rehash*.
209
+
210
+
=== Inserting elements in a HashMap
211
+
212
+
To insert values into a HashMap we first convert the *key* into *an array index* using the hashFunction. Each bucket of the array has linked list to hold the values.
213
+
214
+
There are multiple scenarios for inserting key/values in a HashMap:
215
+
216
+
1. Key doesn’t exist yet, we will add the new key/value.
217
+
2. Key already exists, we will update the value and we are done.
218
+
3. Key doesn’t exist, but the bucket already has other data, this is a collision. Using the linked list, we would push another element to it.
Notice, that we are using a function called getEntry to check if the key already exists. We are going to implement that function next.
229
+
230
+
=== Rehashing the HashMap
231
+
232
+
The idea of rehashing is to double the size when the map is getting full so the collisions are minimized. When we double the size, we try to find the next prime. We explained that keeping the bucket size a prime number is beneficial for minimizing collisions.
The algorithms for finding next prime is implemented https://github.com/amejiarosario/algorithms.js/blob/master/src/data-structures/hash-maps/primes.js[here] and you can find the full HashMap implementation on this file: https://github.com/amejiarosario/algorithms.js/blob/master/src/data-structures/hash-maps/hashmap.js
241
+
242
+
=== Getting values out of a HashMap
243
+
244
+
For getting values out of the Map, we do something similar to inserting. We convert the key into an index using the hash function.
Later, we use the https://github.com/amejiarosario/algorithms.js/blob/master/src/data-structures/linked-lists/linked-list.js[find method] of the linked list to get the node with the matching key. With getEntry, we can also define get and has method.
If the bucket doesn’t exist or is empty we are done. If the value exists we use the https://github.com/amejiarosario/algorithms.js/blob/master/src/data-structures/linked-lists/linked-list.js[remove method] from the linked list.
273
+
274
+
== HashMap time complexity
275
+
276
+
Hash Map it’s very optimal for searching values by key *O(1)**. However, searching values directly is not any better than an array since we have to visit every value *O(n)*.
277
+
278
+
.Time complexity for a Binary Search Tree (BST)
279
+
|===
280
+
.2+.^s| Data Structure 2+^s| Searching By .2+^.^s| Insert .2+^.^s| Delete .2+^.^s| Space Complexity
{empty}* = Amortized run time. E.g. rehashing might affect run time.
286
+
287
+
As you can notice we have amortized times, since in the unfortunate case of a rehash, it will take O(n) while it resizes. After that it will be on average *O(1)*.
288
+
289
+
The full HashMap implementation with comments can be found on: https://github.com/amejiarosario/algorithms.js/blob/master/src/data-structures/hash-maps/hashmap.js
290
+
291
+
== Implementing a TreeMap
292
+
293
+
Implementing a Map with a tree, TreeMap, has a couple of advantages over a HashMap:
294
+
295
+
* Keys are always sorted.
296
+
* Statistical data can be easily obtained like median, highest, lowest key.
297
+
* Collisions are not a concern so in the worst case is still *O(log n)*.
298
+
* Trees are more space efficient and doesn’t need to allocate memory beforehand (e.g. HashMap’s initial capacity) nor you have to rehash when is getting full.
299
+
300
+
Ok, now that you know the advantages, let’s implement it! For a full comparison read the link:#hashmap-vs-treemap[HashMap vs TreeMap] section again.
301
+
302
+
Let’s get started with the basic functions. They have the same interface as the HashMap (but obviously the implementation is different).
303
+
304
+
class TreeMap \{
305
+
306
+
constructor()\{}
307
+
308
+
set(key, value) \{}
309
+
310
+
get(key) \{}
311
+
312
+
has(key) \{}
313
+
314
+
delete(key) \{}
315
+
316
+
}
317
+
318
+
=== Inserting values into a TreeMap
319
+
320
+
For inserting a value on a TreeMap, we first need to inialize the tree:
321
+
322
+
class TreeMap \{
323
+
324
+
constructor() \{
325
+
326
+
this.tree = new Tree();
327
+
328
+
}
329
+
330
+
}
331
+
332
+
The tree, can be an instance of any Binary Search Tree that we implemented so far. However, for better performance it should be a self-balanced tree like a https://github.com/amejiarosario/algorithms.js/blob/master/src/data-structures/trees/red-black-tree.js[Red-Black Tree] or https://github.com/amejiarosario/algorithms.js/blob/master/src/data-structures/trees/avl-tree.js[AVL Tree].
333
+
334
+
set(key, value) \{
335
+
336
+
return this.tree.add(key).data(value);
337
+
338
+
}
339
+
340
+
get size() \{
341
+
342
+
return this.tree.size;
343
+
344
+
}
345
+
346
+
Adding values is very easy (once we have the implementation).
347
+
348
+
=== Getting values out of a TreeMap
349
+
350
+
We search by key which takes *O(log n)* on balanced trees.
351
+
352
+
get(key) \{
353
+
354
+
const node = this.tree.get(key) || undefined;
355
+
356
+
return node && node.getData();
357
+
358
+
}
359
+
360
+
has(key) \{
361
+
362
+
return !!this.get(key);
363
+
364
+
}
365
+
366
+
One side effect of storing keys in a tree is that they can be retrieve in order.
367
+
368
+
* [Symbol.iterator]() \{
369
+
370
+
yield* this.tree.inOrderTraversal();
371
+
372
+
}
373
+
374
+
* keys() \{
375
+
376
+
for (const node of this) \{
377
+
378
+
yield node.value;
379
+
380
+
}
381
+
382
+
}
383
+
384
+
We can use the *in-order traversal* for a BST.
385
+
386
+
=== Deleting values from a TreeMap
387
+
388
+
Removing elements from TreeMap is simple.
389
+
390
+
delete(key) \{
391
+
392
+
return this.tree.remove(key);
393
+
394
+
}
395
+
396
+
The BST implementation does all the heavy lifting.
397
+
398
+
That’s basically it! To see the full file in context, click here: https://github.com/amejiarosario/algorithms.js/blob/master/src/data-structures/maps/tree-maps/tree-map.js[https://github.com/amejiarosario/algorithms.js/blob/master/src/data-structures/maps/tree-maps/tree-map.js]
399
+
400
+
== TreeMap Time complexity vs HashMap
401
+
402
+
As we discussed so far, there are trade-off between the implementations
0 commit comments