A prefix tree (also known as a trie) is a data structure that helps you organize a word list & quickly find words that start with a specific prefix.
For example, you can find all the words in your dictionary that start with the letters “ca”, such as “cat” or “cape”.
Look at this picture:
This is a prefix tree.
You can follow from the root (*) to a marked node (like e and t) to find words.
In this article you will learn how to implement your own prefix tree in Ruby & how to use it to solve problems!
Prefix Tree Implementation
To implement this in Ruby I decided to use a Node class with a few attributes:
The value (one character)
The “word” variable. A true / false value which tells you if this is a valid word
The “next” array. This stores all the characters (as Node objects) that come after this one in the tree
Here’s the code:
class Node
attr_reader :value, :next
attr_accessor :word
def initialize(value)
@value = value
@word = false
@next = []
end
end
Now we need a class to hold the root node & the methods for working with these nodes.
Let’s look at the Trie class:
class Trie
def initialize
@root = Node.new("*")
end
end
Inside this class we have the following methods:
def add_word(word)
letters = word.chars
base = @root
letters.each { |letter| base = add_character(letter, base.next) }
base.word = true
end
def find_word(word)
letters = word.chars
base = @root
word_found =
letters.all? { |letter| base = find_character(letter, base.next) }
yield word_found, base if block_given?
base
end
Both methods break down a given word into an array of characters (using the chars method).
Then we navigate the tree starting at the root & either find a character or add it.
Here are the supporting methods (also inside the Trie class):
def add_character(character, trie)
trie.find { |n| n.value == character } || add_node(character, trie)
end
def find_character(character, trie)
trie.find { |n| n.value == character }
end
def add_node(character, trie)
Node.new(character).tap { |new_node| trie << new_node }
end
To add a character we check if it already exists (using the find method). If it does then we return the node.
Otherwise we create it & return the new node.
Then we also have the include? method:
def include?(word)
find_word(word) { |found, base| return found && base.word }
end
Now we are ready to start using our new data structure and see what we can do with it 🙂
So in the code example I showed you before we have implemented the add & find operations.
But we also want a find_words_starting_with method.
We can do this using the "Depth First Search" (DFS) algorithm. We also need a way to keep track of the word we are looking at.
Remember that our nodes only have one character each, so we need to reconstruct the actual strings by walking over the tree.
Here's a method that does all of that:
def find_words_starting_with(prefix)
stack = []
words = []
prefix_stack = []
stack << find_word(prefix)
prefix_stack << prefix.chars.take(prefix.size-1)
return [] unless stack.first
until stack.empty?
node = stack.pop
prefix_stack.pop and next if node == :guard_node
prefix_stack << node.value
stack << :guard_node
words << prefix_stack.join if node.word
node.next.each { |n| stack << n }
end
words
end
We use two stacks here, one for keeping track of unvisited nodes (stack) & another to keep track of the current string (prefix_stack).
We loop until we have visited all the nodes & add the value of the node to the prefix_stack. Each node holds the value for only one character, so we need to collect these characters to form a word.
The :guard_node symbol is added to the prefix_stack so we know when we are backtracking. We need this to remove characters from our string buffer (prefix_stack) at the right time.
Then if node.word is true it means we found a full word & we add it to our list of words.
If no words are found you will get an empty array:
t.find_words_starting_with("b")
# []
This method can be used to implement an autocomplete feature.
Summary
You learned about prefix trees (also known as tries), a data structure used to organize a list of words into a tree. You can quickly query this tree to check if a word is valid & to find words that share the same prefix.
Don't forget to share this post so more people can learn!
Related
3 comments
joao ferreira says
5 years ago
def include?(word)
find_word(word) { |found, base| return found && base.word }
end
Nice article, and a handy use for trees. I actually did something very much like this, though I did it by shoving everything into a Hash (including functionality), but it aims to accomplish the same thing. This also has the added benefit of searching for “scrabble chains”, where you can take a word on a scrabble board, add a single letter to the front or the back, and it’ll still be a complete word.
https://pastebin.com/YJ6LiBdv – the script itself https://pastebin.com/raw/ejKQvqMa – the dictionary it’ll pull from (there are some extra strings in there right at the top to explicitly demonstrate the capabilities of the chain searching beyond what the actual dictionary would have yielded)