Exploring MRI Source Code

If you have been using Ruby for a while you may be curious about how some things work under the hood.

One way to dig deep into Ruby internals is by reading the source code that makes it work. Even if you don’t know C, you can still pick up some interesting things.

The source code can be found on the github repo for Ruby.

Ideally, you want to use a tool like Codequery which allows you to easily find class & method names.

Exploring Core Classes

Most of your exploration will happen in the root folder. That’s where you will find the source code for all the core classes like Object in object.c or Array in array.c.

Let’s take a look at hash.c.

If you scroll all the way down to line 4468 you will see some names that should be familiar.

Let’s start with this:

rb_cHash = rb_define_class("Hash", rb_cObject);

In this line, the Hash class is being defined by the rb_define_class function. The second argument (rb_cObject) is the superclass for this class.

If you want to learn how the process of defining a class works you search for rb_define_class.

The first part from rb_define_class checks if the class has already been defined.

if (rb_const_defined(rb_cObject, id)) {
  // ...
}

Inside that if block, Ruby does some sanity checks, like making sure that we’re working with a class that’s already defined.

If the class is not defined then it gets defined like this:

klass = rb_define_class_id(id, super);

st_add_direct(rb_class_tbl, id, klass);
rb_name_class(klass, id);
rb_const_set(rb_cObject, id, klass);
rb_class_inherited(super, klass);

return klass;

You could read the definition for all these methods, but I think they are pretty self-explanatory.

In st_add_direct, the ‘st’ part means ‘symbol table’, and this is just a hash table. The rb_const_set function sets a constant on the Object class, this will make the class available everywhere.

And rb_class_inherited calls the inherited method of the superclass, you can find the documentation for this method here.

The next section of the code is composed of method definitions. MRI uses rb_define_method to do that.

Here is an example:

rb_define_method(rb_cHash,"index", rb_hash_index, 1);
rb_define_method(rb_cHash,"size", rb_hash_size, 0);
rb_define_method(rb_cHash,"length", rb_hash_size, 0);
rb_define_method(rb_cHash,"empty?", rb_hash_empty_p, 0);

Arguments go like this:

The first argument is the class on which this method is being defined
The second argument is the method name, the third argument is the C function which actually implements this method
The last argument is the number of arguments this Ruby method requires (a negative value means optional arguments)

The rb_define_singleton_method function is used to define a class method.

rb_define_singleton_method(rb_cHash, "[]", rb_hash_s_create, -1);

The body of rb_define_singleton_method is just one line of code:

rb_define_method(singleton_class_of(obj), name, func, argc);

If you want to continue exploring, a good file to take a look at is object.c.

Exploring The Standard Library

Ok, that’s enough C for today!

How about reading some Ruby code?

The Ruby Standard Library is written in Ruby and you can find it under the /lib directory.

The standard library contains things like OpenStruct, Base64 encoding and the Set data structure.

A set is similar to an array, but with the special property that all elements are unique. In other words, a set contains no duplicates.

How does that work? Are there any fancy algorithms behind this?

If we take a look at set.rb you will quickly discover that this is backed by a Hash object.

# Adds the given object to the set and returns self.  Use +merge+ to
# add many elements at once.
def add(o)
 @hash[o] = true
 self
end

alias << add

So if you add a duplicated element, there is no need to check if it already exists, it will just overwrite the old one.

Exploring Rubinius

Another way to explore Ruby's source code is by taking a look at alternate implementations like Rubinius.

Rubinius code is organized in a different way than MRI, so for this I like to use Github and the 'find file' feature.

If you want to learn more about Enumerable then you just type 'enumerable' and you will see all the related files.

Conclusion

As you have seen, you can learn how Ruby does things under the hood without much effort. Go ahead and explore on your own and let everyone know what you discovered!

If you like this post don't forget to join my newsletter, just drop your email in the form below and you will receive free updates & exclusive content.

5 thoughts on “Exploring MRI Source Code”

Tyler Green

February 18, 2016 at 5:38 am

Enjoyed the post, Jesus! It’s good to be prompted to dig into these things and even better to have a few starting points. Thanks!
- Jesus Castello
  
  February 18, 2016 at 2:50 pm
  
  Thanks for reading 🙂
senthil

February 18, 2016 at 4:31 pm

super, I like blackbytes!
- Jesus Castello
  
  February 18, 2016 at 5:05 pm
  
  Thank you!
tvinky

February 25, 2016 at 12:32 pm

Thanks for this post.
I was more scared of this Ruby implementation under the hood C code than I should. 🙂

Comments are closed.

Exploring Core Classes

Exploring The Standard Library

Exploring Rubinius

Conclusion

Related

5 thoughts on “Exploring MRI Source Code”