Learn to Implement & Use Prefix Trees in Ruby

A prefix tree (also known as a trie) is a data structure that helps you organize a word list & quickly find words that start with a specific prefix.

For example, you can find all the words in your dictionary that start with the letters “ca”, such as “cat” or “cape”.

Look at this picture:

ruby trie

This is a prefix tree.

You can follow from the root (*) to a marked node (like e and t) to find words.

In this article you will learn how to implement your own prefix tree in Ruby & how to use it to solve problems!

Prefix Tree Implementation

To implement this in Ruby I decided to use a Node class with a few attributes:

  • The value (one character)
  • The “word” variable. A true / false value which tells you if this is a valid word
  • The “next” array. This stores all the characters (as Node objects) that come after this one in the tree

Here’s the code:

class Node
  attr_reader   :value, :next
  attr_accessor :word

  def initialize(value)
    @value = value

    @word  = false
    @next  = []
  end
end

Now we need a class to hold the root node & the methods for working with these nodes.

Let’s look at the Trie class:

class Trie
  def initialize
    @root = Node.new("*")
  end
end

Inside this class we have the following methods:

def add_word(word)
  letters = word.chars
  base    = @root

  letters.each { |letter| base = add_character(letter, base.next) }

  base.word = true
end

def find_word(word)
  letters = word.chars
  base    = @root

  word_found =
  letters.all? { |letter| base = find_character(letter, base.next) }

  yield word_found, base if block_given?

  base
end

Both methods break down a given word into an array of characters (using the chars method).

Then we navigate the tree starting at the root & either find a character or add it.

Here are the supporting methods (also inside the Trie class):

def add_character(character, trie)
  trie.find { |n| n.value == character } || add_node(character, trie)
end

def find_character(character, trie)
  trie.find { |n| n.value == character }
end

def add_node(character, trie)
  Node.new(character).tap { |new_node| trie << new_node }
end

To add a character we check if it already exists (using the find method). If it does then we return the node.

Otherwise we create it & return the new node.

Then we also have the include? method:

def include?(word)
  find_word(word) { |found, base| return found && base.word }
end

Now we are ready to start using our new data structure and see what we can do with it 🙂

Trie Uses & Examples

Let's start by adding some words to our tree:

trie = Trie.new

trie.add_word("cat")
trie.add_word("cap")
trie.add_word("cape")
trie.add_word("camp")

You can check if a word is included in the tree like this:

p trie.include?("cape")
# true

p trie.include?("ca")
# false

So what are some uses for this data structure?

  • Solving word games
  • Spell checking
  • Autocomplete

You will need a good dictionary to load into your tree.

I found these which may be useful to you:

Finding Prefixed Words

So in the code example I showed you before we have implemented the add & find operations.

But we also want a find_words_starting_with method.

We can do this using the "Depth First Search" (DFS) algorithm. We also need a way to keep track of the word we are looking at.

Remember that our nodes only have one character each, so we need to reconstruct the actual strings by walking over the tree.

Here's a method that does all of that:

def find_words_starting_with(prefix)
  stack        = []
  words        = []
  prefix_stack = []

  stack        << find_word(prefix)
  prefix_stack << prefix.chars.take(prefix.size-1)

  return [] unless stack.first

  until stack.empty?
    node = stack.pop

    prefix_stack.pop and next if node == :guard_node

    prefix_stack << node.value
    stack        << :guard_node

    words << prefix_stack.join if node.word

    node.next.each { |n| stack << n }
  end

  words
end

We use two stacks here, one for keeping track of unvisited nodes (stack) & another to keep track of the current string (prefix_stack).

We loop until we have visited all the nodes & add the value of the node to the prefix_stack. Each node holds the value for only one character, so we need to collect these characters to form a word.

The :guard_node symbol is added to the prefix_stack so we know when we are backtracking. We need this to remove characters from our string buffer (prefix_stack) at the right time.

Then if node.word is true it means we found a full word & we add it to our list of words.

Example of using this method:

t.find_words_starting_with("cap")

# ["cap", "cape"]

If no words are found you will get an empty array:

t.find_words_starting_with("b")

# []

This method can be used to implement an autocomplete feature.

Summary

You learned about prefix trees (also known as tries), a data structure used to organize a list of words into a tree. You can quickly query this tree to check if a word is valid & to find words that share the same prefix.

Don't forget to share this post so more people can learn!

3 thoughts on “Learn to Implement & Use Prefix Trees in Ruby”

  1. def include?(word)
      find_word(word) { |found, base| return found && base.word }
    end
    

    return statement inside a block?

    • Yes, it works like in a regular method.

      Example:

      a = ->(n) { return 123 if n == 1 }
      
      a.call(1)
      # 123
      
      a.call(2)
      # nil
      
  2. Nice article, and a handy use for trees. I actually did something very much like this, though I did it by shoving everything into a Hash (including functionality), but it aims to accomplish the same thing. This also has the added benefit of searching for “scrabble chains”, where you can take a word on a scrabble board, add a single letter to the front or the back, and it’ll still be a complete word.

    https://pastebin.com/YJ6LiBdv – the script itself
    https://pastebin.com/raw/ejKQvqMa – the dictionary it’ll pull from (there are some extra strings in there right at the top to explicitly demonstrate the capabilities of the chain searching beyond what the actual dictionary would have yielded)

Comments are closed.