DS Menu


Introduction to Tries




A trie (pronounced "try") is a tree-based data structure for storing strings in order to support fast pattern matching. The main application for tries is in information retrieval. Indeed, the name "trie" comes from the word "retrieval". The trie uses the digits in the keys to organize and search the dictionary. Although, in practice, we can use any radix to decompose the keys into digits, in our examples, we shall choose our radixes so that the digits are natural entities such as decimal digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) and letters of the English alphabet (a-z, A-Z).

Trie's Definition

Definition
A trie, also known as a prefix tree, is a tree-like data structure that stores a dynamic set of strings, where keys are usually strings.

Characteristics of Trie's

  • Each node represents a single character of a string.
  • The root node represents the empty string.
  • Children nodes share the same prefix.
  • Paths from the root to a node represent a key.
  • Nodes may store additional values and have a flag to mark the end of a key.

Basic Operations of Trie's

1. Insertion in Tries

  • Start at the root node.
  • For each character in the string, move to the corresponding child node.
  • If the child node doesn’t exist, create it.
  • Mark the end of the string by setting an end-of-word marker at the last node.

2. Search in Tries

  • Begin at the root node.
  • Traverse the trie following the path defined by each character in the string.
  • If the path exists and ends in an end-of-word node, the string is in the trie.
  • If the path ends before the string is exhausted or the end-of-word marker is missing, the string isn’t in the trie.

3. Deletion in Tries

  • Similar to search but includes removing the end-of-word marker.
  • If a node becomes unnecessary (no children), it is removed.
  • Recursively remove nodes up the trie until a node cannot be deleted without affecting other keys.

Different Types of Tries

  1. Standard Trie: The most basic form, which often requires space proportional to the size of the alphabet for each key character.
  2. Compressed Trie: Optimizes space by compressing chains of single-child nodes into single edges.
  3. Suffix Trie: Contains all the suffixes of a given text, used for pattern matching problems.

Space and Time Complexity of Tries

  • Space: A standard trie can require more space than a hash table because of the storage of nodes and pointers, particularly for sparse datasets.
  • Search Time: Tries provide O(m) lookup time, where m is the length of the string. This is independent of the number of keys in the trie.

Advantages of Tries

  • Prefix Matching: Tries are optimized for prefix matching, which is a common requirement in many applications.
  • Sorting: Tries can return keys in lexicographic order by performing a pre-order traversal.

Disadvantages of Tries

  • Space Consumption: For a large set of strings with fewer common prefixes, a trie can consume more memory than necessary.
  • Memory Overhead: Each trie node can have memory overhead due to pointers (or references) it must hold for its child nodes.

Next Topic :Standard Trie