DBMS Menu


Index data Structures




One way to organize data entries is to hash data entries on the sea.rch key. Another way to organize data entries is to build a tree-like data structure that directs a search for data entries.

Hash-Based Indexing

In hash-based indexing, a hash function is used to convert a key into a hash code. This hash code serves as an index where the value associated with that key is stored. The goal is to distribute the keys uniformly across an array, so that access time is, on average, constant.

Let's break down some of these elements to further understand how hash-based indexing works in practice:

Buckets

In hash-based indexing, the data space is divided into a fixed number of slots known as "buckets." A bucket usually contains a single page (also known as a block), but it may have additional pages linked in a chain if the primary page becomes full. This is known as overflow.

Hash Function

The hash function is a mapping function that takes the search key as an input and returns the bucket number where the record should be located. Hash functions aim to distribute records uniformly across buckets to minimize the number of collisions (two different keys hashing to the same bucket).

Disk I/O Efficiency

Hash-based indexing is particularly efficient when it comes to disk I/O operations. Given a search key, the hash function quickly identifies the bucket (and thereby the disk page) where the desired record is located. This often requires only one or two disk I/Os, making the retrieval process very fast.

Insert Operations

When a new record is inserted into the dataset, its search key is hashed to find the appropriate bucket. If the primary page of the bucket is full, an additional overflow page is allocated and linked to the primary page. The new record is then stored on this overflow page.

Search Operations

To find a record with a specific search key, the hash function is applied to the search key to identify the bucket. All pages (primary and overflow) in that bucket are then examined to find the desired record.

Limitations

Hash-based indexing is not suitable for range queries or when the search key is not known. In such cases, a full scan of all pages is required, which is resource-intensive.

Hash-Based Indexing Example

Let's consider a simple example using employee names as the search key.

Employee Records

| Name	  | Age	   | Salary
|-----------|----------|--------
| Alice	  | 28	   | 50000
| Bob	  | 35	   | 60000
| Carol	  | 40	   | 70000

Hash Function: H(x) = ASCII value of first letter of the name mod 3

  • Alice: 65 mod  3 = 2
  • Bob: 66 mod  3 = 0
  • Carol: 67 mod  3 = 1

Buckets:

Bucket 0: Bob
Bucket 1: Carol
Bucket 2: Alice

Pros of Hash-Based Indexing

  • Extremely fast for exact match queries.
  • Well-suited for equality comparisons.

Cons of Hash-Based Indexing

  • Not suitable for range queries (e.g., "SELECT * FROM table WHERE age BETWEEN 20 AND 30").
  • Performance can be severely affected by poor hash functions or a large number of collisions.

Tree-based Indexing

The most commonly used tree-based index structure is the B-Tree, and its variations like B+ Trees and B* Trees. In tree-based indexing, data is organized into a tree-like structure. Each node represents a range of key values, and leaf nodes contain the actual data or pointers to the data.

Why Tree-based Indexing?

Tree-based indexes like B-Trees offer a number of advantages:

  • Sorted Data: They maintain data in sorted order, making it easier to perform range queries.
  • Balanced Tree: B-Trees and their variants are balanced, meaning the path from the root node to any leaf node is of the same length. This balancing ensures that data retrieval times are consistently fast, even as the dataset grows.
  • Multi-level Index: Tree-based indexes can be multi-level, which helps to minimize the number of disk I/Os required to find an item.
  • Dynamic Nature: B-Trees are dynamic, meaning they're good at inserting and deleting records without requiring full reorganization.
  • Versatility: They are useful for both exact-match and range queries.

Tree-based Indexing Example

Continuing with the "Students" table:

ID	Name
1	Abhi
2	Bharath
3	Chinni
4	Devid

A simplified B-Tree index could look like this:


      [1, 3]
     /       \
  [1]         [3, 4]
 /   \       /      \
1     2    3         4

In the tree, navigating from the root to the leaf nodes will lead us to the desired data record.

Pros of Tree-based Indexing:

  • Efficient for range queries.
  • Good for both exact and partial matches.
  • Keeps data sorted.

Cons of Tree-based Indexing:

  • Slower than hash-based indexing for exact queries.
  • More complex to implement and maintain.

Next Topic :Comparison of File Organizations, Indexes and Performance Tuning