Hashing in a Nutshell
- Maps data of arbitrary size to fixed-size values using hash functions.
- Supports fast lookup, insertion, and deletion — usually in constant time O(1).
- Commonly used in data structures like hash tables, sets, and maps.
What is Hashing Technique?
Hashing is a technique used to convert a given input into a fixed-size value (called a hash code or hash value) using a function known as a hash function. The result of the hash function is typically used as an index to store the data in an array-like data structure (commonly a hash table).
Hashing allows us to perform operations like search, insert, and delete in O(1) average time, making it one of the most powerful techniques in data structure design and algorithmic problem solving.
How Hashing Works
- Use a hash function to convert the key into an index.
- Store the value at that index in a hash table.
- If two keys hash to the same index, resolve the collision using a method like chaining or open addressing.
Pseudocode
// Basic Hash Table Insertion using Chaining
function insert(key, value):
index = hashFunction(key)
if hashTable[index] is empty:
hashTable[index] = new list
hashTable[index].append((key, value))
// Search for a value
function search(key):
index = hashFunction(key)
for (k, v) in hashTable[index]:
if k == key:
return v
return null
Hash Function
A hash function is used to map a large set of possible keys to a smaller range of indices. A good hash function should:
- Be fast to compute
- Distribute keys uniformly across the table
- Minimize collisions (cases where multiple keys hash to the same index)
Example: For strings, a simple hash function might be:
function hashFunction(key):
hash = 0
for char in key:
hash = (hash * 31 + ASCII(char)) % TABLE_SIZE
return hash
Collision Resolution Techniques
Since multiple keys can hash to the same index, collisions must be handled. Common strategies include:
1. Chaining
- Each bucket holds a linked list of key-value pairs.
- Insert new elements at the head or tail of the list.
2. Open Addressing
- When a collision occurs, search the table for the next free slot.
- Variants include:
- Linear Probing: Check next slot, then next, etc.
- Quadratic Probing: Check index + 1², 2², etc.
- Double Hashing: Use second hash function to compute step size.
Applications of Hashing
- Implementing dictionaries/maps (e.g., HashMap, HashSet)
- Checking for duplicates in an array
- Counting frequency of elements
- Detecting cycles in graphs (e.g., using visited sets)
- Caching with LRU (Least Recently Used) strategies
Example: Frequency Counter
Given an array of integers, count how many times each number appears:
function countFrequency(arr):
freq = {}
for num in arr:
if num in freq:
freq[num] += 1
else:
freq[num] = 1
return freq
Time and Space Complexity
- Time Complexity: O(1) average for insert, search, and delete
- Worst Case: O(n) — when all keys collide
- Space Complexity: O(n) — where n is number of elements
When to Use Hashing
- You need fast lookup or membership tests
- You want to count frequencies or group values quickly
- You need to detect duplicates efficiently
Advantages and Disadvantages of Hashing
Advantages
- Extremely Fast: Constant-time access in average case
- Simple Implementation: Especially with built-in hash maps
- Flexible: Works with any key type that can be hashed
Disadvantages
- Not Sorted: Does not preserve order of keys
- Collisions: Require extra handling logic
- Depends on Good Hash Function: Poor design can degrade performance
Conclusion
Hashing is a powerful technique widely used in software engineering, data structures, and system design. Its ability to perform operations in constant time makes it ideal for performance-critical applications. However, careful attention must be given to collision handling and hash function design to maintain efficiency.