An Overview of Data Structures For Ruby Developers

What is a data structure?

A data structure is a specific way to organize & access data.

Examples include:

  • Arrays
  • Binary trees
  • Hashes

Different data structures excel at different tasks.

For example, hashes are great if you’re looking to store data that looks like a dictionary (word & definition), or a phone book (person name & number).

Knowing what data structures are available, and the characteristics of each of them, will make you a better Ruby developer.

That’s what you’ll learn in this article!

Understanding Arrays

The array is the first data structure that you learn about when you start reading about programming.

Arrays use a contiguous chunk of memory where objects are stored one after another without gaps between them.

Unlike in lower-level programming languages, like C, Ruby does all the hard work of managing your memory, expanding the maximum array size & compacting it when you delete elements.

Uses:

  • As a base for more advanced data structures
  • To gather results from running a loop
  • Collecting items

You’ll find arrays everywhere, like the split & chars methods, which break down a string into an array of characters.

Example:

out = []

10.times { |i| out << i }

out
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The following table shows you how the different array operations behave as the size of the array increases.

If you aren't familiar with time complexity notation read this article.

Time complexity for arrays:

Operation Complexity
push O(1)
pop O(1)
access O(1)
find O(n)
delete O(n)

Why is this helpful?

Because it tells you the performance characteristics of arrays.

If you're doing a lot of find operations on a HUGE array that's going to be slow...

But if you know what indexes to use, then an array is going to be fast because of the O(1) time complexity.

Choose your data structure with this criteria:

  1. Performance characteristics => What are you doing with the data? How big is your dataset?
  2. Shape & form of your data => What kind of data are you working with? Could you re-organize your data so it fits a better data structure?

The Hash Data Structure

Do you have a mapping between country codes & country names?

Or maybe you just want to count stuff.

That's exactly what hashes are helpful for!

A hash is a data structure where every value has a key & this key can be anything, like a string, an integer, a symbol, etc.

Internally, a hash converts your key into a number (using the hash method in Ruby) & then uses that number as the index.

Uses:

  • Counting characters in a string
  • Mapping words to definitions, names to phone numbers, etc.
  • Find duplicates inside an array

Example:

"aaabcd"
  .each_char
  .with_object(Hash.new(0)) { |ch, hash| hash[ch] += 1 }

# {"a"=>3, "b"=>1, "c"=>1, "d"=>1}

Time complexity:

Operation Complexity
store O(1)
access O(1)
delete O(1)
find (value) O(n)

Hashes are one of the most helpful data structures when it comes to performance because of the constant O(1) store, delete & access time.

Find in the context of a hash means that you're looking for a specific value.

Stacks

A stack is like a stack of plates, you put one plate on top of another & you can only remove the plate on top.

This is more useful than it sounds at first!

Uses:

  • Replaces recursive methods with a regular loop
  • Keep track of work left to do, leaving the most recent on top
  • Reverse an array

Example:

stack = [1,2,3,4,5]

(1..stack.size).map { stack.pop }

# [5, 4, 3, 2, 1]

Yes, you can use reverse instead.

This is only an example to show you this particular characteristic of stacks.

Time complexity:

Operation Complexity
push O(1)
pop O(1)
find ---
access ---

Notice that stacks (and queues) only have two operations, insert & delete, or push & pop.

While it's possible to search inside a stack, it's very rare.

How to Use Binary Trees

Most Ruby developers have probably heard about binary trees but never used one.

Why is that?

First, we don't have a built-in binary tree implementation.

Second, a binary tree is not that helpful for everyday programming challenges, unlike arrays & hashes which you use ALL the time.

But binary trees are a very interesting data structure.

Binary Tree

In fact, there are many variations, like the Trie (covered in next section), multiway trees like the B-Tree (used in databases) & the Heap.

Uses:

Example:

# https://github.com/jamesconant/bstree

require 'bstree'

root = Bstree::Node.new(5)

root.insert(2)
root.insert(7)

root.search(3)
# nil

Time complexity:

Operation Complexity
insert O(log n)
delete O(log n)
find O(log n)
access ---

A balanced binary tree is when all nodes have two children & all leaves have the same level

If a tree becomes unbalanced, the performance degrades to O(n).

In a self-balanced binary tree (like the Red-Black Tree, or AVL tree), every operation takes time proportional to the height (or level) of the tree.

Notice how there is no access time because to access a node you first have to find it...

In that case, you'll have O(log n) for access.

But if keep a reference (as a variable) to a specific node, that would be O(1) access time.

The Trie Data Structure

A trie is a specialized tree-like data structure.

It's helpful for working with words, and then quickly searching for words that start with a prefix, or search for the full word.

Uses:

  • Word games
  • Spelling checker
  • Autocomplete suggestions

Example:

# https://github.com/gonzedge/rambling-trie

require 'rambling-trie'

trie = Rambling::Trie.create('words.txt')

trie.include?('chocolate')
# true
trie.include?('salmon')
# true

Time complexity:

Operation Complexity
add O(k)
include? O(k)
words O(k)

In this table, I use k to denote the size of the input string, while n is used to denote the size of the data structure itself.

So for the word apple, k would be 5.

Summary

You have learned about common data structures, their main uses & characteristics, and how to use them in Ruby.

Data Structures Mindmap

When you apply this new knowledge you'll be able to solve problems faster!

Could you share this post for me if you found it helpful?

Thank you 🙂

4 comments
Marcin says 7 months ago

There is an error in hash find complexity. Its O(1) on average.

    Jesus Castello says 7 months ago

    Hi, thanks for your comment!

    I just clarified what I mean by find.

    It’s the find method in Ruby, it iterates over all the hash keys, so that’s O(n).

Dave Aronson says 7 months ago

WRT array popping, it’s important to note that what Ruby calls pop() is removing the last element, so it’s essentially acting as a stack. Removing the first element, so that it would be treated as a queue, would be O(n), since it would have to shift all the other elements down by 1. (Unless it did something fancy under the hood like maintaining an offset, but the generic abstract data type array doesn’t do that.)

    Jesus Castello says 7 months ago

    Thanks for your comment 🙂

Comments are closed