Getting the AST from IronRuby
I’ve been looking into IronRuby and how to retrieve the Abstract Syntax Tree (AST) that’s used. In this post I’ll show a simple example of how you can do the same.
First let’s look at what a compiler is and how it works. A compiler basically works in three phases: Lexical Analysis, Syntactic/Semantic Analysis and Code Generation. Given a piece of code the compiler first tokenizes the input. This means splitting the code into its simplest parts and it’s the Lexical Analysis phase. Examples of these parts could be a variable name, a string, an assignment operator (such as = in C#). The output from the Tokenizer is feed into a Parser in the Syntactic/Semantic Analysis phase. The Parser takes the individual parts – the tokens – and creates a generic representation of the code. This representation is the Abstract Syntax Tree (AST). In the last phase (Code Generation) the compiler uses the AST to generate code that can run on the target platform.
Usually each of these phases has multiple sub phases that further helps to validate, optimize, etc. the code.
If we take a modern language such as C# the input will be C# code in a .cs file and the output will be IL code that can run on the Common Language Runtime (CLR).
Now let’s take a look at an example of how the AST might look like for the simple assignment shown in the following code segment:
a = 2 + 3
After being parsed this code might have an AST representation as shown in the following:
= / \ a + / \ 2 3
The parts shown in the tree are called nodes, and there are 5 in total. A variable name, two integers (or literals as they’re called), a plus sign (or more specifically a Binary Expression) and an Assignment (the “=”).
Getting the AST is a first step in trying to analyze the code yourself. Since creating a tokenizer and parser for a language such as Ruby is a non trivial effort, it’s great if you can piggyback on others work. And for looking at Ruby code on .NET IronRuby is obviously perfect.
In the following code I’m using IronRuby 1.0 RC2, which is the latest release.
First we need to add a couple of assembly references, which you can find where you installed IronRuby (usually “c:\ironruby\bin\”). We need IronRuby.dll, IronRuby.Libraries.dll, Microsoft.Dynamic.dll, Microsoft.Scripting.dll, Microsoft.Scripting.Core.dll, Microsoft.Scripting.Core.dll, Microsoft.Scripting.Helpers.dll
Then import the following namespaces:
using IronRuby.Builtins; using IronRuby.Compiler; using IronRuby.Compiler.Ast; using Microsoft.Scripting; using Microsoft.Scripting.Hosting.Providers;
Next we’ll setup an IronRuby runtime and engine, this is the core infrastructure we need to work with Ruby code. Last we’ll create an instance of the ScriptSource class with a simple piece of Ruby code that calls “puts” with “hello” as a parameter.
var runtime = IronRuby.Ruby.CreateRuntime();
var engine = runtime.GetEngine('rb');
var src = engine.CreateScriptSourceFromString(@"puts 'hello'");
The ScriptSource is actually executeable, so if you called the method Execute() the simple Ruby expression would be executed. This is the basic pattern you’d follow if you wanted to add scripting capabilities to your application.
Using a helper class in the Dynamic Language Runtime (DLR) API’s we can access the SourceUnit instance. We create a new IronRuby Parser and ask it to parse the code found in the SourceUnit instance.
var srcUnit = HostingHelpers.GetSourceUnit(src); var parser = new Parser(); var srcTreeUnit = parser.Parse(srcUnit, new RubyCompilerOptions(), ErrorSink.Default);
To access the AST we need to implement a Walker. This class implements the Visitor Design Pattern, and using it we can look at each of the nodes in the AST. A simple walker is shown below.
public class MyWalker : Walker
{
protected override void Walk(MethodCall node)
{
Console.WriteLine("Method call: " + node.MethodName);
base.Walk(node);
}
protected override void Walk(StringLiteral node)
{
Console.WriteLine("String Literal: " + node.GetMutableString(RubyEncoding.Default).ToString());
base.Walk(node);
}
}
We kickstart the process by calling the Walk() method on our Walker implementation and gives it the SourceUnit as input.
var walker = new MyWalker(); walker.Walk(srcTreeUnit);
For each node type that the AST can contain a method is called on our Walker, and we can then do what we want. In this example I print the current node to the console.
The output of running this code is:
Method call: puts String Literal: hello
The next step is to implement the rest of the methods on the walker, thereby supporting the full set of AST nodes. Also you’d probably want to build up your own data structure for analysis.
This example showed how you can access the parsed Ruby code using IronRuby. The next step is to actually do something interesting with this knowledge.
Leave a Reply
You must be logged in to post a comment.
I am self employed as an independent software development consultant at 

