Building a Parser-Generator in Erlang, Part 5

Jun 21, 2009 by Sean Cribbs

I’ve realized I gave short-shrift to the parse_transform code that I posted yesterday. I wandered around the topic without much direction. So I’m going to revisit that code and show you the real meat of it: how I convert

rule(nonterminal) ->
  parser_fun();

into:

nonterminal(Input, Index) ->
  peg:p(Input, Index, nonterminal, fun(I,D) ->
                                      (parser_fun())(I,D)
                                      end,
                                   fun(Node) ->
                                      transform(nonterminal, Node)
                                    end).

First steps

The AST of an Erlang module is a list of tuples. Those tuples represent attributes (things like the module name, the exports, etc), and function definitions. The general strategy will be this:

Fold over the AST, looking for definitions for the rule function.
Map all of the function clauses for the rule function into their top-level equivalents.

This is the essence of the transform_rules function:

transform_rules(AST) ->
  transform_rules(AST, []).

transform_rules([], Accum) ->
  lists:reverse(Accum);
transform_rules([{function,_Line,rule,1,Clauses}|Rest], Accum) ->
  Rules = lists:reverse(build_rules(Clauses)),
  transform_rules(Rest, Rules++Accum);
transform_rules([H|Rest], Accum) ->
  transform_rules(Rest, [H|Accum]).

Like a typical iteration routine in Erlang, it pushes things on the head of a list and then reverses at the end. When it finds a matching function, it takes the clauses of that function, transforms them, and then pushes them on the head. Note that the second clause of transform_rules will only match a single-arity function (that’s what the number 1 is doing in there). Since I’m use lists:map() in the build_rules function as you will see below, I have to reverse them before I prepend them to the accumulator.

Mapping the rules to new functions

Now let’s look at build_rules.

build_rules(Clauses) ->
  Rules = lists:map(fun build_rule/1, Clauses),
  [generate_file_function(),generate_parse_function()|Rules].

This is pretty simple. I take the clauses, and map them into their transformation, then prepend the file and parse functions to the front. If you were evil and wanted to screw up this transform, you could provide multiple definitions (not simply clauses) of the rules function. But then it wouldn’t be very useful, would it?

Now we get to an individual rule. Since the logic is a little complicated, I broke it out into several functions. First, is build_rule.

build_rule({clause,Line,[{atom,_,Name}],_,Stmt}) ->
  ets:insert_new(peg_transform, {root, Name, Line}),
  Wrapped = wrap_reductions(Stmt, Name),
  {function,Line,Name,2,
   [{clause,Line,[{var,Line,'Input'},{var,Line,'Index'}],[],Wrapped}]}.

The very first line lets us record the first rule in the grammar/parser as the root rule. This is useful later in generate_parse_function. If the row with the key root is already in the ets table, it ignores the insert. Second, we wrap the “reductions” (body of the rule) in our peg:p function as we will see below. Lastly, we return a function definition with the rule-name as the function name, an arity of 2, and a single clause that contains our wrapped reductions.

Wrap it up

Remember the whole point of this exercise was to avoid typing peg:p() over and over. Here’s where we get to that.

wrap_reductions(Stmt,Name) ->
  Inner = hd(Stmt),
  Line = element(2, Inner), % [{call,Line,_,_}]
  Fun = wrap_fun(Inner, Line),
  Transform = semantic_fun(Name, Line),
  [{call,Line,
    {remote,Line,{atom,Line,peg},{atom,Line,p}},
    [
     {var,Line,'Input'},
     {var,Line,'Index'},
     {atom,Line,Name},
     Fun,
     Transform
    ]}].

wrap_reductions only allows a single statement (one top-level combinator). We extract the line number out of that statement, pass the statement to another wrapper, which will return an inline-function/lambda containing the parser statement. Then we generate a similar function that will take the parsed node and transform it. Last, we pop these two funs into our peg:p() call.

Before we look at wrap_fun, let’s look at semantic_fun, which is a little simpler:

semantic_fun(Rule, Line) ->
  {'fun',Line,
   {clauses,
    [{clause,Line,
      [{var,Line,'Node'}],
      [],
      [{call,Line,
        {atom,Line,transform},
        [{atom,Line,Rule},{var,Line,'Node'}]}]}]}}.

Interestingly, unlike function definitions, 'fun' definitions always use clauses, even if you only have one clause! This one’s really straightforward, just putting the rulename and the line number in the right places.

Now, our final wrapper, wrap_fun.

wrap_fun(Stmts, Line) when is_tuple(Stmts) ->
  {'fun',Line,
   {clauses,
    [{clause,Line,
      [{var,Line,'I'},{var,Line,'D'}],
      [],
      [{call,Line,
        Stmts,
        [{var,Line,'I'},{var,Line,'D'}]}]}]}};
wrap_fun(Stmts, Line) ->
  io:format("peg rules must be double-arity functions or statements and cannot be sequences of statements!~n"),
  throw({parse_error, not_tuple, Line, Stmts}).

I engaged in a little paranoia here — I wanted to make sure someone hand-writing a parser wouldn’t try to do something that would break the convention of a single top-level combinator in each rule. Maybe a little overboard.

For the sake of keeping things short and not colliding with variables in the same scope, I used I and D as local variable instead of Input and Index. What is really curious is how the call statement comes out. In regular Erlang, in order to call a fun that is the result of some other operation, you have to wrap it in parentheses, like so:

(some_fun())(Var)

However, in AST, it looks like any old call, except that instead of being an atom, variable, or remote, it is the substatement.

Back to the top

Now that we’ve dug into the depths of the transform, let’s look again at the top-level.

parse_transform(AST, _Options) ->
  ets:new(peg_transform, [named_table, set]),
  [{attribute, _, module, ModName}] = lists:filter(fun(Form)->
                                                       case Form of
                                                         {attribute,_,module,_} -> true;
                                                         _ -> false
                                                       end
                                                   end, AST),
  ets:insert(peg_transform, {module, ModName}),
  Result = transform_rules(AST),
  ets:delete(peg_transform),
  Result.

You’ll notice the call to transform_rules near the end of the function – where the action happens. The rest of the parse_transform function is devoted to:

Setting up an ets table that we saw used in previous snippets.
Capturing the module name in the ets table, which we use in generating the parse function.
Cleaning up the ets table when we’re done.

Conclusion

I hope that is a bit more clear and gives you a better idea of how I’m accomplishing the transformation.