Ruby Internals: Exploring the Memory Layout of Ruby Objects

Would you like a quick tour of Ruby internals?

Then you’re in for a treat.

Because

We’re going to explore together how a Ruby object is laid out in memory & how you can manipulate internal data structures to do some cool stuff.

Fasten your seatbelts & get ready for a journey into the depths of the Ruby interpreter!

Memory Layout of Arrays

When you create an array, Ruby has to back that up with some system memory & a little bit of metadata.

Metadata includes:

  • Array size (item count)
  • Array capacity
  • Class
  • Object status (frozen or not)
  • Pointer to where data is stored in memory

Since the main Ruby interpreter (MRI) is written in C there are no objects.

But there is something else: structs.

A struct in C helps you store related data together, and this is used a lot in MRI’s source code to represent things like Array, String‘s & other kinds of objects.

By looking at one of those structs we can infer the memory layout of an object.

So let’s look at the struct for Array, called RArray:

struct RArray {
  struct RBasic basic;

  union {
    struct {
      long len;

      union {
        long capa;
        VALUE shared;
      } aux;

      const VALUE *ptr;
    } heap;

    const VALUE ary[RARRAY_EMBED_LEN_MAX];
  } as;
};

I know this can look a bit intimidating if you are not familiar with C, but don’t worry! I will help you break this down into easy to digest bits 🙂

The first thing we have is this RBasic thing, which is also a struct:

struct RBasic {
  VALUE flags;
  VALUE klass;
}

This is something that most Ruby objects have & it contains a few things like the class for this object & some binary flags that say if this object is frozen or not (and other things like the ‘tainted’ attribute).

In other words:

RBasic contains the generic metadata for the object.

After that we have another struct, which contains the length of the array (len).

The union expression is saying that aux can be either capa (for capacity) or shared. This is mostly an optimization thing, which is explained in more detail in this excellent post by Pat Shaughnessy. In terms of memory allocation, the compiler will use the biggest type inside an union.

Then we have ptr, which contains the memory address where the actual Array data is stored.

Here’s a picture of what this looks like (every white/grey box is 4 bytes in a 32-bit system):

array memory layout

You can see the memory size of an object using the ObjectSpace module:

require 'objspace'

ObjectSpace.memsize_of([])
# 20

Now we are ready to have some fun!

Fiddle: A Fun Experiment

RBasic is exactly 8 bytes in a 32-bit system & 16 bytes in a 64-bit system. Knowing this we can use the Fiddle module to access the raw memory bytes for an object & change them for some fun experiments.

For example:

We can change the frozen status by toggling a single bit.

This is in essence what the freeze method does, but notice how there is no unfreeze method.

Let’s implement it just for fun!

First, lets require the Fiddle module (part of the Ruby Standard Library) & create a frozen string.

require 'fiddle'

str = 'water'.freeze
str.frozen?
# true

Next:

We need the memory address for our string, which can be obtained like this.

memory_address = str.object_id * 2

Finally:

We flip the exact bit that Ruby checks to see if an object is frozen. We also check to see if this worked by calling the frozen? method.

Fiddle::Pointer.new(memory_address)[1] ^= 8

str.frozen?
# false

Notice that the index [1] refers to the 2nd byte of the flags value (which is composed of 4 bytes in total).

Then we use ^= which is the “XOR” (Exclusive OR) operator to flip that bit.

We do this because different bits inside flags have different meanings & we don’t want to change something unrelated.

If you have read my ruby tricks post you may have seen this before, but now you know how it works 🙂

Another thing you can try is to change the length of the array & print the array.

You will see how the array becomes shorter!

You can even change the class to make an Array think it’s a String

Conclusion

You have learned a bit about how Ruby works under the hood. How memory for Ruby objects is laid out & how you can use the Fiddle module to play around with that.

You should probably not use Fiddle like this in a real app, but it’s fun to experiment with.

Don’t forget to share this post so more people can see it 🙂