You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
.We need to validate that brackets are properly opened and closed, following these rules:
199
+
.We need to validate that brackets are correctly opened and closed, following these rules:
200
200
- An opened bracket must be close by the same type.
201
201
- Open brackets mush be closed in the correct order.
202
202
203
-
This is a parsing problem, and usually, stacks are good candidates for them.
203
+
We are facing a parsing problem, and usually, stacks are good candidates for them.
204
204
205
205
*Algorithm*:
206
206
@@ -391,8 +391,8 @@ We can visit the tree using a Queue and keep track when a level ends, and the ne
391
391
392
392
Since during BFS, we dequeue one node and enqueue their two children (left and right), we might have two levels (current and next one). For this problem, we need to know what the last node on the current level is.
393
393
394
-
.There are several ways to solve this problem using BFS. Here are some ideas:
395
-
- *1 Queue + Sentinel node*: we can use a special character in the `Queue` like `'*'` or `null` to indicate the level's change. So, we would start something like this `const queue = new Queue([root, '*']);`.
394
+
.There are several ways to solve this problem by using BFS. Here are some ideas:
395
+
- *1 Queue + Sentinel node*: we can use a special character in the `Queue` like `'*'` or `null` to indicate a level change. So, we would start something like this `const queue = new Queue([root, '*']);`.
396
396
- *2 Queues*: using a "special" character might be seen as hacky, so you can also opt to keep two queues: one for the current level and another for the next level.
397
397
- *1 Queue + size tracking*: we track the Queue's `size` before the children are enqueued. That way, we know where the current level ends.
398
398
@@ -428,8 +428,164 @@ The complexity of any of the BFS methods or DFS is similar.
428
428
- Space: `O(n)`. For BFS, the worst-case space is given by the maximum *width*. That is when the binary tree is complete so that the last level would have `(n-1)/2` nodes, thus `O(n)`. For the DFS, the space complexity will be given by the tree's maximum *height*. In the worst-case, the binary tree is skewed to the right so that we will have an implicit call stack of size `n`.
This simple problem can have many solutions; let's explore some.
443
+
444
+
_Brute force_
445
+
446
+
One brute force approach could be doing two for loops. We sum two different numbers and check if they add up to the target. If yes, we return, and if not, we keep increasing the indices until we check every possible pair.
This approach's time complexity is `O(n^2)`, because we visit every number twice in the worst-case. While the space complexity is `O(1)`.
454
+
455
+
Can we trade space for time? Yes!
456
+
457
+
_Map_
458
+
459
+
Based on `nums[i] + nums[j] === target` we can say that `num[j] === target - nums[i]`. We can do one pass and check if we have seen any number equal to `target - nums[i]`. A map is perfect for this job. We could have a HashMap that maps `num` to `index`. Let's see the algorithms to make it work.
460
+
461
+
462
+
*Algorithm*:
463
+
464
+
* Visit every number once
465
+
** Calculate the complement `target - nums[i]`.
466
+
** If the complement exists, return its index and the current index.
467
+
** If not, save the complement and the index number.
This solution's time complexity is `O(n^3)` because of the 3 nested loops.
498
+
499
+
How can we do better? Notice that the last for loop, compute the sum repeatedly just to add one more.
500
+
Let's fix that!
501
+
502
+
_Cummulative Sum_
503
+
504
+
For this solution, instead of computing the sum from `i` to `j` all the time. We can calculate a cumulative sum. Every time we see a new number, we add it to the aggregate.
505
+
506
+
Since we want all possible subarray, We can increase `i` and get sum for each:
Notice that when the array has a 0, the cumulative sum has a repeated number. If you subtract those numbers, it will give you zero. In the same way, If you take two other ranges and subtract them (`sum[j] - sum[i]`), it will give you the sum of that range `sum(num[i]...num[j])`.
546
+
547
+
For instance, if we take the index `2` and `0` (with values 6 and 1) and susbtract them we get `6-1=5`. To verify we can add the array values from index 0 to 2, `sum([1, 2, 3]) === 5`.
548
+
549
+
With that intuition, we can use a Map to keep track of the aggregated sum and the number of times that sum.
550
+
551
+
*Algorithm*:
552
+
553
+
* Start sum at 0
554
+
* Visit every number on the array
555
+
** Compute the cumulative sum
556
+
** Check if `sum - k` exits; if so, it means that there's a subarray that adds up to k.
557
+
** Save the sum and the number of times that it has occurred.
You might wonder, what the map is initialized with `[0, 1]`. Consider this test case:
567
+
568
+
[source, javascript]
569
+
----
570
+
subarraySum([1], 1); // k = 1
571
+
----
572
+
573
+
The sum is 1, however `sum - k` is `0`. If it doesn't exist on the map, we will get the wrong answer since that number adds up to `k`. We need to add an initial case on the map: `map.set(0, 1)`. If `nums[i] - k = 0`, then that means that `nums[i] = k` and should be part of the solution.
574
+
575
+
*Complexity Analysis*:
576
+
577
+
- Time: `O(n)`. We visit every number once.
578
+
- Space: `O(n)`. The map size will be the same as the original array.
1. We use a *hash function* to transform the keys (e.g., dog, cat, rat, …) into an array index. This _array_ is called *bucket*.
32
32
2. The bucket holds the values or list of values in case of collisions.
33
33
34
-
In the illustration, we have a bucket size of 10. In bucket 0, we have a collision. Both `cat` and `art` keys map to the same bucket even thought their hash codes are different.
34
+
In the illustration, we have a bucket size of 10. In bucket 0, we have a collision. Both `cat` and `art` keys map to the same bucket even though their hash codes are different.
35
35
36
36
In a HashMap, a *collision* is when different keys lead to the same index. They are nasty for performance since it can reduce the search time from *O(1)* to *O(n)*.
37
37
38
-
Having a big bucket size can avoid a collision but also can waste too much memory. We are going to build an _optimized_ HashMap that re-sizes itself when it is getting full. This auto-resizing avoids collisions and don't need to allocate too much memory upfront. Let’s start with the *hash function*.
38
+
Having a big bucket size can avoid a collision but also can waste too much memory. We are going to build an _optimized_ HashMap that resizes itself when it is getting full. This auto-resizing avoids collisions and don't need to allocate too much memory upfront. Let’s start with the *hash function*.
39
39
40
40
===== Designing an optimized hash function
41
41
42
42
To minimize collisions, we need to create an excellent hash function.
43
43
44
44
IMPORTANT: A *perfect* hash function is one that assigns a unique array index for every different key.
45
45
46
-
It’s no practical and memory-wise wasteful to have a perfect hash function, so we are going to shoot for a cost-effective hash function instead.
46
+
It’s no practical and memory-wise wasteful to have a perfect hash function, so we will shoot for a cost-effective hash function instead.
47
47
48
48
.To recap:
49
49
- A hash function converts keys into array indices.
50
50
- A hash function is composed of two parts:
51
51
1. *Hash Code*: maps any key into an integer (unbonded)
52
52
2. *Compression function*: maps an arbitrary integer to integer in the range of [0… BUCKET_SIZE -1].
53
53
54
-
Before doing a great hash function, let's see what a lousy hash function looks like. 😉
54
+
Before doing an excellent hash function, let's see what a lousy hash function looks like. 😉
55
55
56
56
====== Analysing collisions on bad hash code functions
57
57
58
-
The goal of a hash code function is to convert any value given into a positive integer — a common way to accomplish with summing each string’s Unicode value.
58
+
A hash code function's goal is to convert any value given into a positive integer — a common way to accomplish with summing each string’s Unicode value.
59
59
60
60
.Naïve hashing function implementation
61
61
[source, javascript]
@@ -72,7 +72,7 @@ The `charCodeAt()` method returns an integer between `0` and `65535` representin
72
72
73
73
The `codePointAt()` method returns a non-negative integer that is the Unicode code point value.
74
74
****
75
-
With this function we have the can convert some keys to numbers as follows:
75
+
With this function, we have the can convert some keys to numbers as follows:
Notice that `rat` and `art` have the same hash code! These are collisions that we need to solve.
84
84
85
-
Collisions happened because we are adding the letter's Unicode and are not taking the order into account nor the type. We can do better by offsetting the character value based on their position in the string. We can also add the object type, so number `10` produce different output than string `'10'`.
85
+
Collisions happened because we add the letter's Unicode and are not taking the order into account or the type. We can do better by offsetting the character value based on their position in the string. We can also add the object type, so number `10` produce different output than the string `'10'`.
86
86
87
87
.Hashing function implementation that offset character value based on the position
88
88
[source, javascript]
@@ -109,17 +109,17 @@ BigInt has no virtual limits (until you run out of physical memory). It uses the
109
109
----
110
110
****
111
111
112
-
As you can imagine, summing 20bits per letter leads to a humongous number! That's the case even for three letters words. We are using `BigInt`, so it doesn’t overflow.
112
+
As you can imagine, summing 20bits per letter leads to a massive number! That's the case even for three-letter words. We are using `BigInt`, so it doesn’t overflow.
We don’t have duplicates anymore! If the keys have different content or type, they have a different hash code. However, we need to represent these unbounded integers to finite buckets in an array. We do that using *compression function*. This function can be as simple as `% BUCKET_SIZE`.
120
+
We don’t have duplicates anymore! If the keys have different content or type, they have distinct hash codes. However, we need to represent these unbounded integers to finite buckets in an array. We do that using *compression function*. This function can be as simple as `% BUCKET_SIZE`.
121
121
122
-
However, there’s an issue with the last implementation. It doesn’t matter how enormous (and different) is the hash code number if we at the end use the modulus to get an array index. The part of the hash code that truly matters is the last bits.
122
+
However, there’s an issue with the last implementation. It doesn’t matter how enormous (and different) is the hash code number if we, in the end, use the modulus to get an array index. The part of the hash code that truly matters is the last bits.
123
123
124
124
.Look at this example with a bucket size of 4.
125
125
[source, javascript]
@@ -131,9 +131,9 @@ However, there’s an issue with the last implementation. It doesn’t matter ho
131
131
50 % 4 //↪️ 2
132
132
----
133
133
134
-
All the hash codes are different and still we get many collisions! [big]#😱#
134
+
All the hash codes are different, and still, we get many collisions! [big]#😱#
135
135
136
-
Based on numbers properties, using a prime number as the modulus produce fewer collisions.
136
+
Based on numbers properties, using a prime number as the modulus produces fewer collisions.
137
137
138
138
.Let’s see what happens if the bucket size is a prime number:
139
139
[source, javascript]
@@ -149,7 +149,7 @@ Now it’s more evenly distributed!! [big]#😎👍#
149
149
150
150
.So, to sum up:
151
151
* Bucket size should always be a *prime number*, so data is distributed more evenly and minimized collisions.
152
-
* Hash code doesn’t have to be too big. At the end what matters is the few last digits.
152
+
* Hash code doesn’t have to be too big. In the end, what matters is the few last digits.
153
153
154
154
Let’s design a better HashMap with what we learned.
155
155
@@ -171,9 +171,9 @@ Take a look at the following function:
Is somewhat similar to what we did before, in the sense that we use each letter’s Unicode is used to compute the hash. The difference is:
174
+
It is somewhat similar to what we did before, in the sense that we use each letter’s Unicode to compute the hash. The difference is:
175
175
176
-
1. We are using the XOR bitwise operation (`^`) to produce an *avalanche effect*, where a small change in two strings produces completely different hash codes. E.g.
176
+
1. We are using the XOR bitwise operation (`^`) to produce an *avalanche effect*, where a small change in two strings makes completely different hash codes. E.g.
A oneletter change produce a very different output.
185
+
A one-letter change produces a very different output.
186
186
187
-
We are using the FVN-1a prime number (`16777619`) and the offset (`2166136261`) to reduce collisions even further. If you are curious where these numbers come from check out this http://bit.ly/fvn-1a[link].
187
+
We are using the FVN-1a prime number (`16777619`) and the offset (`2166136261`) to reduce collisions even further if you are curious where these numbers come from, check out this http://bit.ly/fvn-1a[link].
188
188
189
189
FVN-1a hash function is a good trade-off between speed and collision prevention.
Notice that we are also keeping track of collisions (for benchmarking purposes) and a load factor. *The load factor* measures how full the hash map is. We don’t want to be fuller than 75%. If the HashMap is getting too full, then we are going to fix it doing a *rehash* (more on that later).
211
+
Notice that we are also keeping track of collisions (for benchmarking purposes) and a load factor. *The load factor* measures how full the hash map is. We don’t want to be fuller than 75%. If the HashMap is getting too full, we will fix it doing a *rehash* (more on that later).
<1> Key doesn’t exist yet, so we create the new key/value pair.
226
226
<2> Key already exists, then we will replace the value.
227
227
<3> Key doesn’t exist, but the bucket already has other data, this is a collision! We push the new element to the bucket.
228
-
<4> To keep insertion order, we keep track of the order of the keys using `keysTrackerArray` and `keysTrackerIndex`.
228
+
<4> To keep insertion order, we keep track of the keys' order using `keysTrackerArray` and `keysTrackerIndex`.
229
229
230
-
Notice, that we are using a function called `getEntry` to check if the key already exists. It gets the index of the bucketcorresponding to the key and then checks if the entry with the given key exists. We are going to implement this function in a bit.
230
+
Notice that we are using a function called `getEntry` to check if the key already exists. It gets the bucket's index corresponding to the access and then checks if the entry with the given key exists. We are going to implement this function in a bit.
231
231
232
232
====== Getting values out of a HashMap
233
233
@@ -243,7 +243,7 @@ include::{codedir}/data-structures/maps/hash-maps/hash-map.js[tag=getEntry, inde
243
243
<3> Use Linked list's <<part02-linear-data-structures#array-search-by-value>> method to find value on the bucket.
244
244
<4> Return `bucket` and `entry` if found.
245
245
246
-
With the help of the `getEntry` method, we can do the `HashMap.get` and `HashMap.has` methods:
246
+
With the `getEntry` method, we can do the `HashMap.get` and `HashMap.has` methods:
247
247
248
248
.HashMap's get method
249
249
[source, javascript]
@@ -277,7 +277,7 @@ method.
277
277
278
278
===== Rehashing a HashMap
279
279
280
-
Rehashing is a technique to minimize collisions when a hash map is getting full. It doubles the size of the mapand recomputes all the hash codes and insert data in the new buckets.
280
+
Rehashing is a technique to minimize collisions when a hash map is getting full. It doubles the map's size and recomputes all the hash codes, and inserts data in the new buckets.
281
281
282
282
When we increase the map size, we try to find the next prime. We explained that keeping the bucket size a prime number is beneficial for minimizing collisions.
Hash Map it’s very optimal for searching values by key in constant time *O(1)*. However, searching by value is not any better than an array since we have to visit every value *O(n)*.
295
+
Hash Map is optimal for searching values by key in constant time *O(1)*. However, searching by value is not any better than an array since we have to visit every value *O(n)*.
296
296
(((Tables, Non-Linear DS, HashMap complexities)))
297
297
298
298
// tag::table[]
@@ -307,4 +307,74 @@ Hash Map it’s very optimal for searching values by key in constant time *O(1)*
307
307
// end::table[]
308
308
309
309
indexterm:[Runtime, Linear]
310
-
As you can notice we have amortized times since, in the unfortunate case of a rehash, it will take O(n) while it resizes. After that, it will be *O(1)*.
310
+
As you can notice, we have amortized times since it will take O(n) while it resizes in the unfortunate case of a rehash. After that, it will be *O(1)*.
311
+
312
+
313
+
==== Practice Questions
314
+
(((Interview Questions, Hash Map)))
315
+
316
+
317
+
318
+
// tag::hashmap-q-two-sum[]
319
+
===== Fit 2 movies in a flight
320
+
321
+
*HM-1*) _You are working in an entertainment recommendation system for an airline. Given a flight duration (target) and an array of movies length, you need to recommend two movies that fit exactly the length of the flight. Return an array with the indices of the two numbers that add up to the target. No duplicates are allowed. If it's not possible to return empty `[]`._
322
+
323
+
// end::hashmap-q-two-sum[]
324
+
325
+
// _Seen in interviews at: Amazon, Google, Apple._
0 commit comments