Home »
Data Structure »
Hashing Coding Problems
Sort elements by frequency
In this article, we are going to see how to sort an array based on frequency? This problem is a classic problem in hashing and a variety of interview problems is based on this core concept. In this article also, we are going to see the instance of sorting a map based on its value instead of a key.
Submitted by Radib Kar, on July 05, 2020
Prerequisite:
Problem statement:
Sort an array based on frequencies. The element having maximum frequency will come first. If two elements have the same frequency then the element coming first will appear first in the sorted array two.
Example:
Input array:
[1, 2, 3, 2, 3, 2, 1]
After sorting frequency wise:
[2, 2, 2, 1, 1, 3, 3]
2 has the maximum frequency
thus, it came first and since 1 and 3 have the same frequency,
so, 1 came first preserving the order in the input array.
Solution:
The idea is to create a hash table first which will store the frequency of unique elements. That can be created like below:
Hash function, h(x)=x but instead of storing key we will store the count
Initially, hash table is empty.
For each key in input array,
Hash[key]++
End for
If we create the hash table from the above input array, it will be,
Key Frequency
1 2
2 3
3 2
So now we need to sort the map based on the value instead of the key, but for the same frequencies, we need to preserve the key order(it’s order not key itself).
To preserve the key order we require another hash table that would store the first position of the key. We will store that like below:
Say this hash table is named first_occurrence
For each key in input array,
If first_occurence[key]==0: //not assigned
first_occurence[key]=key index+1
End for
So after processing the input the hash table first_occurrence would be:
Key First occurrence
1 1(0+1)
2 2(1+1
3 3(2+1)
So now rest of the task is to sort the hash table based on the map and based on first_occurrence hash table.
We will use the priority queue to sort this map based on our defined comparator where we will implement logic to sort by frequency value and first_occurrence hash table.
How to pass this in the comparator is our main challenge or area of our discussion?
You can observe a thing that we have designed a class for this problem and kept the first_occurrence hash table as a data member, not any local member to a function. Thing is not the thing I do in general. So, there must be something, that made me design a class and bring the OOP concept within it. This is because I need to feed the first_occurrence hash table into the comparator and the comparator is a lambda function. Like below,
auto comp=[](pair<int,int> a, pair<int,int> b){
//function body
}
But this time if you notice the implementation, we have
auto comp=[this](pair<int,int> a, pair<int,int> b){
//function body
}
This helps you to pass the data members to the lambda function or it's known as capturing of the lambda function.
That's why we designed the class and made first_occurrence as data members to feed into the lambda function. Now we can perform our logic similarly as we do in comparators.
The lambda comparator would be like below which will give us a sorted map based on our requirement via the priority_queue.
auto comp=[this](pair<int,int> a, pair<int,int> b){
if(a.second<b.second)
return true;
else if(a.second>b.second)
return false;
else{
if(first_occurence[a.first]<first_occurence[b.first])
return false;
else
return true;
}
};
So after pushing all pairs(a map element is pair, <key, element>) we will have our priority queue as,
Key Frequency
2 3
1 2
3 2
Now, pop each element from the priority queue and append the key as many times as the frequency,
So at iteration 1
We will pop <2,3>
So vector will be 2 2 2
So at iteration 2
We will pop <1,2>
So vector will be 2 2 2 1 1
So at iteration 3
We will pop <3,2>
So vector will be 2 2 2 1 1 3 3
Queue Is empty
So our sorted array based on frequency is
2 2 2 1 1 3 3
C++ implementation:
#include <bits/stdc++.h>
using namespace std;
class Includehelp {
public:
unordered_map<int, int> first_occurence;
void print(vector<int> arr)
{
for (auto i : arr) {
cout << i << " ";
}
cout << endl;
}
vector<int> sort_by_frequency(vector<int> arr)
{
//create the hashmap
unordered_map<int, int> mymap;
for (int i = 0; i < arr.size(); i++) {
mymap[arr[i]]++;
if (first_occurence[arr[i]] == 0)
first_occurence[arr[i]] = i + 1;
}
//now to sort based on frequency lets use priority queue
//comparator of priority queue using lambda function
auto comp = [this](pair<int, int> a, pair<int, int> b) {
if (a.second < b.second)
return true;
else if (a.second > b.second)
return false;
else {
if (first_occurence[a.first] < first_occurence[b.first])
return false;
else
return true;
}
};
priority_queue<pair<int, int>, vector<pair<int, int> >, decltype(comp)> pq(comp);
for (auto& ij : mymap) {
pq.push(ij);
}
//sorted outcome
vector<int> result;
while (!pq.empty()) {
pair<int, int> p = pq.top();
pq.pop();
int no = p.first;
int freq = p.second;
for (int i = 0; i < freq; i++)
result.push_back(no);
}
return result;
}
};
int main()
{
int n;
cout << "Enter number of elements\n";
cin >> n;
vector<int> arr(n, 0);
cout << "Enter the elements\n";
for (int i = 0; i < n; i++) {
cin >> arr[i];
}
Includehelp* obj = new Includehelp();
arr = obj->sort_by_frequency(arr);
cout << "Printing after sorting by frequency\n";
obj->print(arr);
return 0;
}
Output:
Enter number of elements
8
Enter the elements
1 2 3 1 2 3 4 5
Printing after sorting by frequency
1 1 2 2 3 3 4 5