DS Menu

Open addressing

Open addressing is a collision resolution technique used in hash tables. In open addressing, all elements are stored directly in the hash table itself. When a collision occurs (i.e., two items hash to the same slot), the method seeks to find another slot to accommodate one of the items using a probing sequence.

Linear probing

Linear probing is a type of open addressing where the probing sequence is linear.

How Linear Probing Works

Hash Function: Like any hash table, linear probing starts with a hash function that computes an initial index for a given key.

Insertion:

Compute the hash for the key to find the initial index.
If the slot at the computed index is empty, insert the item there.
If the slot is occupied, check the next slot (i.e., move linearly forward) until an empty slot is found.
If the end of the table is reached, wrap around to the beginning.

Algorithm 1

function INSERT (hash table, table length, key, hash_key)
    index ← hash_value 
    first_scan 	← true
    while isEmpty(hash_table, index) = false do
        index ← (index + 1) mod table_length
        if index = hash_value and first_scan = false then
            return false;
        first_scan = false;
    hash_table[index] ← key;
    return true;

Searching:

Compute the hash for the key to find the initial index.
If the slot is empty, the key is not in the table.
If the slot contains the key, return the item.
If the slot contains a different key, move linearly forward until the key is found or an empty slot is encountered.

Algorithm 2

function SEARCH (hash_table, table_length, key, hash_key)
    index ← hash_value 
    first_scan ← true
    while hash table(index]!= key do
        index ← (index + 1) mod table_length
        if index = hash value and first scan = false then
            return -1;
        first_scan = false;
    return index;

- Find the item using the search operation.
- Remove the item and mark the slot as deleted (a special marker different from empty and occupied).
- Note: Deletion complicates the search and insert operations because the search must continue past deleted items, and insert can use slots marked as deleted.

Advantages of Linear Probing

Memory Efficiency: All data is stored in the hash table, no extra space for pointers like in separate chaining.
Cache-Friendly: Linear memory access pattern is good for cache performance.

Disadvantages of Linear Probing

Clustering: One of the significant drawbacks of linear probing is clustering. Clusters of occupied slots tend to grow, increasing the average search time.
Load Factor: As the table becomes more filled, performance degrades. It’s essential to keep the load factor (ratio of items to table size) relatively low.
Deletion Complexity: Deleted slots must be marked specially and complicate the search process.

Quadratic probing

Quadratic probing is another method of open addressing used in hash tables to resolve collisions. Unlike linear probing, where the interval between probes is fixed, quadratic probing uses a quadratic function to calculate the interval between probes. This approach helps to reduce the clustering problem seen in linear probing.

How Quadratic Probing Works

Insertion using Quadratic Probing:
- Use the hash function hash(key) to calculate the initial index for the given key.
- Check if the slot at the calculated initial index is empty.
- If it's empty, place the key-value pair in that slot, and the insertion is complete.
- If the slot at the initial index is occupied (collision occurred), quadratic probing comes into play.
- Initialize a variable i to 1 (for the first iteration). Calculate the next probe index using the quadratic probing formula:
- next_index = (initial_index + (i²)) % table_size
- Check if the slot at the calculated next_index is empty.
- If it's empty, place the key-value pair in that slot, and the insertion is complete.
- If the slot is occupied, repeat the probing process by incrementing i and recalculating the next_index until an empty slot is found.
Algorithm
```
function insert(key, value):
    initial_index = hash(key)
    index = initial_index
    i = 1
    while table[index] is not empty:
        index = (initial_index + i^2) % table_size
        i = i + 1
    table[index] = (key, value)
```
Search using Quadratic Probing:
- Use the hash function hash(key) to calculate the initial index for the given key.
- Check if the slot at the calculated initial index contains the key you're searching for.
- If it does, return the associated value or indicate that the key is present.
- If the slot at the initial index does not contain the key (collision occurred), use quadratic probing.
- Initialize a variable i to 1 (for the first iteration). Calculate the next probe index using the quadratic probing formula:
- next_index = (initial_index + (i²)) % table_size
- Check if the slot at the next_index contains the key you're searching for.
- If it does, return the associated value or indicate that the key is present.
- If it doesn't, increment i and repeat the probing process until an empty slot is found or the key is located.
- If the entire probing process completes without finding the key (i.e., an empty slot is encountered), indicate that the key is not found in the table.
Algorithm
```
function search(key):
    initial_index = hash(key)
    index = initial_index
    i = 1
    while table[index] is not empty:
        if table[index].key == key:
            return table[index].value
        index = (initial_index + i^2) % table_size
        i = i + 1
    return "Key not found"
```

Advantages of Quadratic Probing

Reduces Clustering: Compared to linear probing, quadratic probing significantly reduces primary clustering, as the probe sequence spreads out more quickly.
Better Cache Performance: It can offer better cache performance than more randomized probing methods, although not as good as linear probing.
Utilizes Hash Table Efficiently: Quadratic probing tends to utilize the hash table more efficiently than linear probing before the performance degrades due to clustering.

Disadvantages of Quadratic Probing

Secondary Clustering: Although it solves primary clustering, quadratic probing can suffer from secondary clustering where different keys that hash to the same initial index follow the same probe sequence.
Complexity in Calculation: The probe sequence calculation is more complex than in linear probing.
Table Size Constraints: For quadratic probing to work correctly, the table size should be a prime number, and even then, it does not guarantee that all slots can be probed.
Load Factor Sensitivity: Like other open addressing methods, as the load factor increases, performance tends to degrade due to an increase in collisions.

Double hashing

Double hashing is a technique used in hash tables to resolve collisions through open addressing. Unlike linear or quadratic probing, double hashing uses a second hash function to calculate the probe sequence. This approach significantly reduces the clustering issues seen in other probing methods.

Definition of Double Hashing

Double hashing employs two hash functions. When a collision occurs (i.e., two keys hash to the same index), the second hash function is used to determine the interval between probes.

How Double Hashing Works

Initial Hash Function: The first hash function computes an initial index. Let's denote this as hash1(key).
Second Hash Function: A second, independent hash function computes another hash value. Let's denote this as hash2(key). It's crucial that hash2(key) never evaluates to zero to ensure that the probe sequence advances.
- Upon collision, the next index to probe is calculated using the formula: (hash1(key) + i * hash2(key)) % table_size, where i is the i^th probe (starting at 0).
- This process repeats, incrementing i, until an empty slot is found or the table is deemed full.

Advantages of Double Hashing

Reduces Clustering: Double hashing significantly reduces both primary and secondary clustering.
Efficient Table Utilization: It tends to utilize the hash table efficiently, ensuring that empty slots are found even when the table starts getting full.
Improved Performance: Offers better average-case performance for searching compared to linear and quadratic probing.

Disadvantages of Double Hashing

Complexity: Requires the computation of two hash functions, which might be more computationally intensive.
Design of Hash Functions: It's crucial to design both hash functions carefully to ensure they are independent and that the second hash function never produces zero.
Sensitivity to Hash Function Quality: The performance of double hashing is highly reliant on the quality of both hash functions.

Next Topic :Rehashing