Monday, February 6, 2012

Rich macros in ANTLR grammars

For the lastest extension to FTL, my declarative user interface language, I wanted to add some code sharing abilities: traits and file includes. These are purely syntactic AST transformations but tricky for tools like ANTLR because the transformations are non-local.

For example, I wanted to be able to write the following code in FTL:
trait ChildCount {
attributes {
var numChildren : int;
}
constraints {
loop boxes {
numChildren := fold 0 .. $-.numChildren + 1;
}
}
}


class HBox(ChildCount) : Node {
children {
boxes : [ Node ];
}
attributes {
var render : Scene;
}
constraints {
render := text(10, 10, numChildren, #FF0);
}
}


Desugaring would eliminate the traits through a global transformation. In this case, it yields:

class HBox : Node {
children {
boxes : [ Node ];
}
attributes {
var render : Scene;
}
constraints {
render := text(10, 10, numChildren, #FF0);
}

attributes {
var numChildren : int;
}
constraints {
loop boxes {
numChildren := fold 0 .. $-.numChildren + 1;
}
}
}


(I'm glossing over hygiene etc. for the purposes of this post).

I got a fairly good start based on these posts.

Eventually, I realized I had two problems:


  1. Given a usage of a trait (as a parse tree) and the trait's expansion (as a string), how do I make desugared parse tree? While it may be possible to do the expansion earlier (e.g., lexing), this is a cleaner step. For example, the error handling is better: we get clean parse errors for the unexpanded form (e.g., trait was not found or is in the wrong place) and for bad expansions (e.g., the trait references an identifier unavailable at its point of use).

  2. How do I incorporate the new parse into the semantics actions associated with the parser? For example, if the parser counts the number of nodes, it should be of the final desugared parse tree. Counting the original nodes would be wrong, as would the sum of the original + new.



The final solution was pretty simple at heart:

First, I told ANTLR to store the traits:

Trait :
'trait' id '{' body[false,null] '}'
{ traits.put($id.text, $body.text); }
;


Second, I used a tree rewrite rule that replaces uses of traits with subtrees that are programmatically constructed in a semantic action through invocations of sub-rule parser methods:

classTrait :
'class' c=id '(' tId=id ')' ':' typeId=id
{
FtlClass cls = new FtlClass($cName.text);
String traitBodyStr = traits.get($tId.text);
}
'{' body[true, cls] '}'
{
CharStream inputstream = null;
inputstream = new ANTLRStringStream(traitBodyStr);
ALEParser innerparser = new ALEParser(new CommonTokenStream(new ALELexer(inputstream)));
CommonTree traitBody = (CommonTree)(innerparser.body(true, cls).getTree());
transferStatePost(innerparser);
}
-> ^('class' $c ':' $typeId '{' body {traitBody} '}')
;


The idea is that for a class using a trait, we lookup the trait (traitBodyStr), create a new parse subtree from it (traitBody), and replace it with a new subtree (righthand side of ->) that incorporates a programmatically generated subtree (traitBody).

A subtle but important practical point is that, for the programmatic call to a sub-rule parser -- body(true,cls) -- innerparser is a new parser with its own state. We can pass parameters to it such as (true, cls) or initialize the parser state to mimic the current one (e.g., duplicate the symbol table), and get them back out. In this case, I needed some post-parsing parser state, so I added my custom method transferStatePost to handle it.

Bonus for any compiler DSL writers: could you synthesize the tree rewrite rule from my example? :)

No comments: