Static Analysis in Ruby

Let’s say that you want to parse your source code to find all your methods, where they are defined & what arguments do they take.

How can you do this?

Your first idea might be to write a regexp for it…

But is there a better way?

ruby static analysis

Yes!

Static analysis is a technique you can use when you need to extract information from the source code itself.

This is done by converting source code into tokens (parsing).

Let’s get right into it!

Using the Parser Gem

Ruby has a parser available on the standard library, the name is Ripper. The output is hard to work with so I prefer using the fantastic parser gem. Rubocop uses this gem to do its magic.

This gem also includes a binary you can use to parse some code directly and see the resulting parse tree.

Here is an example:

ruby-parse -e '%w(hello world).map { |c| c.upcase }'

The output looks like this:

(block
  (send
    (array
      (str "hello")
      (str "world")) :map)
  (args
    (arg :c))
  (send
    (lvar :c) :upcase))

This can be useful if you are trying to understand how Ruby parses some code. But if you want to create your own analysis tools you will have to read the source file, parse it and then traverse the generated tree.

Example:

require 'parser/current'

code = File.read('app.rb')
parsed_code = Parser::CurrentRuby.parse(code)

The parser will return an AST (Abstract Syntax Tree) of your code. Don’t get too intimidated by the name, it’s simpler than it sounds 🙂

Traversing The AST

Now that you have parsed your code using the parser gem you need to traverse the resulting AST.

You can do this by creating a class that inherits from AST::Processor.

Example:

class Processor < AST::Processor
end

Then you have to instantiate this class & call the .process method:

ast = Processor.new
ast.process(parsed_code)

You need to define some on_ methods. These methods correspond to the node names in the AST.

To discover what methods you need to define you can add the handler_missing method to your Processor class. You also need the on_begin method.

class Processor < AST::Processor
  def on_begin(node)
    node.children.each { |c| process(c) }
  end

  def handler_missing(node)
    puts "missing #{node.type}"
  end
end

Here is where we are:

You have your Ruby AST and a basic processor, when you run this code you will see the node types for your AST.

Now:

You need to to implement all the on_ methods that you want to use. For example, if I want all the instance method names along with their line numbers I can do this:

def on_def(node)
  line_num    = node.loc.line
  method_name = node.children[0]

  puts "Found #{method_name} at line #{line_num}"
end

When you run your program now it should print all the method names found.

Conclusion

Building a Ruby static analysis tool is not as difficult as it may look. If you want a more complete example take a look at my class_indexer gem. Now it's your turn to make your own tools!

Please share this post if you enjoyed it! 🙂