Home »
Data Structure
Hashing | Open addressing for collision handling
Open addressing for collision handling: In this article are we are going to learn about the open addressing for collision handling which can be further divided into linear probing, quadratic probing, and double hashing.
Submitted by Radib Kar, on July 01, 2020
Prerequisite: Hashing data structure
Open addressing
In open addressing, all the keys will be stored in the hash table itself, not by using any additional memory or extending the index(linked list). This is also known as closed hashing and this is done mainly based on probing. Probing can be done based on either linear probing or quadratic probing.
In open addressing, we keep rehashing until we resolve.
Linear Probing
In linear probing, the rehashing process in linear. Say the location found at any step is n and n is occupied then the next attempt will be to hash at position (n+1). We wrap around from the last table location to first location if necessary.
rehashing(key) = (n+1) % table size
Example:
say the keys are: {123, 124, 333, 4679,983}
hash table size is 10
So,
Step 1: Inserting 123 in the hash map. So, location is 3 and we can map there since not occupied.
Step 1: Inserting 123 in the hash map. So, location is 3 and we can map there since not occupied.
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 Empty
5 Empty
6 Empty
7 Empty
8 Empty
9 Empty
Step 2: Inserting 124 in the hash map. So, location is 4 and we can map since not occupied.
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 124
5 Empty
6 Empty
7 Empty
8 Empty
9 Empty
Step 3:
Inserting 333 in the hash map. So, location is 3, but it's occupied
Next location is 4, that's occupied too.
Next location is 5 and we can map here since unoccupied. (linear probing)
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 124
5 333
6 Empty
7 Empty
8 Empty
9 Empty
Step 4: Inserting 4679 in the hash map. So, location is 9. We can map here since unoccupied.
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 124
5 333
6 Empty
7 Empty
8 Empty
9 4679
Step 5:
Inserting 983 in the hash map. So, location is 3.
But it's occupied and using linear probing we map it to 6.
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 124
5 333
6 983
7 Empty
8 Empty
9 4679
Advantage and disadvantages of linear probing:
- The advantage of linear probing is that it's easy to implement.
- The disadvantage is that it leads to clustering. Due to linear probing, the collided keys form a cluster in groups and increases the probing length which reduces the overall efficiency.
C++ implementation for linear probing
//linear probing
#include <bits/stdc++.h>
using namespace std;
void add_using_linear_probing(int hash[], int a)
{
//hash function h(x)=x%10
int k = a % 10;
//linear probing
while (true) {
if (hash[k] == -1) {
hash[k] = a;
break;
}
k = (k + 1) % 10; //linear increment of probe
}
}
int main()
{
//set of input numbers
vector<int> arr{ 123, 124, 333, 4679, 983 };
//initialize the hash table
//each entry of the hash table is a single entry
int hash[10]; //size of hashtable is 10
memset(hash, -1, sizeof(hash)); //initialize with empty initially
for (int a : arr) {
//hashing
add_using_linear_probing(hash, a);
}
cout << "---------using linear probing---------\n";
cout << "Hash table is:\n";
for (int i = 0; i < 10; i++) {
if (hash[i] == -1)
cout << i << "->"
<< "Empty" << endl;
else
cout << i << "->" << hash[i] << endl;
}
return 0;
}
Output:
---------using linear probing---------
Hash table is:
0->Empty
1->Empty
2->Empty
3->123
4->124
5->333
6->983
7->Empty
8->Empty
9->4679
Quadratic probing
The problem of clustering can be avoided by using quadratic probing. Here the rehashing is done like below,
rehashing(key) = (n+k2) % table size
where, k is 1,2,3, ...
We wrap around from the last table location to the first location if necessary.
Like say hashing location initially is 3 and 3 is occupied then we will go for 3+12=4, if it’s still occupied we will go for 4+22=8. So on
For the above set of keys that we used for linear probing we will use quadratic probing.
Step 1: Inserting 123 in the hash map. So, location is 3 and we can map there since not occupied.
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 Empty
5 Empty
6 Empty
7 Empty
8 Empty
9 Empty
Step 2: Inserting 124 in the hash map. So, location is 4 and we can map since not occupied.
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 124
5 Empty
6 Empty
7 Empty
8 Empty
9 Empty
Step 3:
Inserting 333 in the hash map. So, location is 3, but it’s occupied.
Next location is 3+12=4, that’s occupied too.
Next location is 4+22=8 and we can map here since unoccupied. (quadratic probing)
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 124
5 Empty
6 Empty
7 Empty
8 333
9 Empty
Step 4:
Inserting 4679 in the hash map. So, location is 9.
We can map here since unoccupied.
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 124
5 Empty
6 Empty
7 Empty
8 333
9 4679
Step 5:
Inserting 983 in the hash map. So, location is 3.
So, location is 3, but it's occupied
Next location is 3+12=4, that's occupied too.
Next location is 4+22=8 and that's occupied too.
Next location is 8+32==17%10=7
we can map here since unoccupied. (quadratic probing)
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 124
5 Empty
6 Empty
7 983
8 333
9 4679
Advantage and disadvantages of quadratic probing:
The main advantage is here we can overcome the problem of clustering which appeared in the case of linear probing. By quadratic chances of clustering is much less.
C++ implementation of quadratic probing
//quadratic probing
#include <bits/stdc++.h>
using namespace std;
void add_using_quadratic_probing(int hash[], int a)
{
//hash function h(x)=x%10
int k = a % 10;
//quadratic probing
int incr = 1;
while (true) {
if (hash[k] == -1) {
hash[k] = a;
break;
}
//quadratic increment of probe
k = (k + int(pow(incr, 2))) % 10;
incr++;
}
}
int main()
{
//set of input numbers
vector<int> arr{ 123, 124, 333, 4679, 983 };
//initialize the hash table
//each entry of the hash table is a single entry
int hash[10]; //size of hashtable is 10
memset(hash, -1, sizeof(hash)); //initialize with empty initially
for (int a : arr) {
//hashing
add_using_quadratic_probing(hash, a);
}
cout << "---------using quadratic probing---------\n";
cout << "Hash table is:\n";
for (int i = 0; i < 10; i++) {
if (hash[i] == -1)
cout << i << "->"
<< "Empty" << endl;
else
cout << i << "->" << hash[i] << endl;
}
return 0;
}
Output:
---------using quadratic probing---------
Hash table is:
0->Empty
1->Empty
2->Empty
3->123
4->124
5->Empty
6->Empty
7->983
8->333
9->4679
Double hashing
Double hashing is the best open addressing technique to overcome clustering chances. Here we increment the probing length based on another hash function.
Say the primary hash function is h1 and secondary hash function is h2 to increment probing length
Then f(key)=h1(key)+k*h2(key) where h2≠h1
Like, first we find h1(key). If it's occupied, we will go for h1(key)+h2(key) where h2(key) is the probing length. If it's still occupied then will go for h1(key) +2*h2(key), so on
We will wrap up from last of table table to beginning of table if necessary
If we do double hashing with our previous example, And take h1(key)=key%10 and h2(key)=number digits in key
Step 1: Inserting 123 in the hash map. So, location is 3 and we can map there since not occupied.
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 Empty
5 Empty
6 Empty
7 Empty
8 Empty
9 Empty
Step 2: Inserting 124 in the hash map. So, location is 4 and we can map since not occupied.
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 124
5 Empty
6 Empty
7 Empty
8 Empty
9 Empty
Step 3:
Inserting 333 in the hash map. So, location is 3, but it's occupied.
Next location is 3+digits(3) = 6, that's occupied too.
Next location is 4+22=8 and we can map here since unoccupied. (double hashing)
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 124
5 Empty
6 333
7 Empty
8 Empty
9 Empty
Step 4:
Inserting 4679 in the hash map. So, location is 9.
We can map here since unoccupied.
Index Keys
0 Empty
1 Empty
2 Empty
3 123
4 124
5 Empty
6 333
7 Empty
8 Empty
9 4679
Step 5:
Inserting 983 in the hash map. So, location is 3, but it's occupied.
Next location is 3+digit(983)=6, that's occupied too.
Next location is 3+2*digit(983)=9 and that's occupied too.
Next location is 3+3*digit(983)==12%10=2
We can map here since unoccupied. (double hashing)
Index Keys
0 Empty
1 Empty
2 983
3 123
4 124
5 Empty
6 333
7 Empty
8 Empty
9 4679
Advantages and disadvantages of double hashing:
The clustering chance is very less as we are using a separate hashing function, h2(key) to increment the probing length. A new hashing function is an overhead of course.
Double hashing uses few probes than quadratic or linear probing but takes more time than those two.
C++ implementation of Double hashing
//double hashing
#include <bits/stdc++.h>
using namespace std;
int digit(int a)
{
return to_string(a).length();
}
void add_using_double_hashing(int hash[], int a)
{
//hash function h1(x)=x%10
//hash function h2(x)=digit(x)
//for incrementing probing
int k = a % 10;
//double hashing
int count = 1;
while (true) {
if (hash[k] == -1) {
hash[k] = a;
break;
}
//double hashing for incrementing prob length
k = (k + count * digit(a)) % 10;
count++;
}
}
int main()
{
//set of input numbers
vector<int> arr{ 123, 124, 333, 4679, 983 };
//initialize the hash table
//each entry of the hash table is a single entry
int hash[10]; //size of hashtable is 10
//initialize with empty initially
memset(hash, -1, sizeof(hash));
for (int a : arr) {
//hashing
add_using_double_hashing(hash, a);
}
cout << "---------using double hashing---------\n";
cout << "Hash table is:\n";
for (int i = 0; i < 10; i++) {
if (hash[i] == -1)
cout << i << "->"
<< "Empty" << endl;
else
cout << i << "->" << hash[i] << endl;
}
return 0;
}
Output:
---------using double hashing---------
Hash table is:
0->Empty
1->Empty
2->983
3->123
4->124
5->Empty
6->333
7->Empty
8->Empty
9->4679