Building a Parser-Generator in Erlang, Part 5
by Sean Cribbs
I’ve realized I gave short-shrift to the parse_transform code that I posted yesterday. I wandered around the topic without much direction. So I’m going to revisit that code and show you the real meat of it: how I convert
into:
First steps
The AST of an Erlang module is a list of tuples. Those tuples represent attributes (things like the module name, the exports, etc), and function definitions. The general strategy will be this:
- Fold over the AST, looking for definitions for the
rule
function. - Map all of the function clauses for the
rule
function into their top-level equivalents.
This is the essence of the transform_rules
function:
Like a typical iteration routine in Erlang, it pushes things on the head of a list and then reverses at the end. When it finds a matching function, it takes the clauses of that function, transforms them, and then pushes them on the head. Note that the second clause of transform_rules
will only match a single-arity function (that’s what the number 1 is doing in there). Since I’m use lists:map()
in the build_rules
function as you will see below, I have to reverse them before I prepend them to the accumulator.
Mapping the rules to new functions
Now let’s look at build_rules
.
This is pretty simple. I take the clauses, and map them into their transformation, then prepend the file
and parse
functions to the front. If you were evil and wanted to screw up this transform, you could provide multiple definitions (not simply clauses) of the rules function. But then it wouldn’t be very useful, would it?
Now we get to an individual rule. Since the logic is a little complicated, I broke it out into several functions. First, is build_rule
.
The very first line lets us record the first rule in the grammar/parser as the root rule. This is useful later in generate_parse_function
. If the row with the key root
is already in the ets table, it ignores the insert. Second, we wrap the “reductions” (body of the rule) in our peg:p
function as we will see below. Lastly, we return a function definition with the rule-name as the function name, an arity of 2, and a single clause that contains our wrapped reductions.
Wrap it up
Remember the whole point of this exercise was to avoid typing peg:p()
over and over. Here’s where we get to that.
wrap_reductions
only allows a single statement (one top-level combinator). We extract the line number out of that statement, pass the statement to another wrapper, which will return an inline-function/lambda containing the parser statement. Then we generate a similar function that will take the parsed node and transform it. Last, we pop these two funs into our peg:p()
call.
Before we look at wrap_fun
, let’s look at semantic_fun
, which is a little simpler:
Interestingly, unlike function
definitions, 'fun'
definitions always use clauses
, even if you only have one clause! This one’s really straightforward, just putting the rulename and the line number in the right places.
Now, our final wrapper, wrap_fun
.
I engaged in a little paranoia here — I wanted to make sure someone hand-writing a parser wouldn’t try to do something that would break the convention of a single top-level combinator in each rule. Maybe a little overboard.
For the sake of keeping things short and not colliding with variables in the same scope, I used I
and D
as local variable instead of Input
and Index
. What is really curious is how the call
statement comes out. In regular Erlang, in order to call a fun that is the result of some other operation, you have to wrap it in parentheses, like so:
However, in AST, it looks like any old call
, except that instead of being an atom, variable, or remote
, it is the substatement.
Back to the top
Now that we’ve dug into the depths of the transform, let’s look again at the top-level.
You’ll notice the call to transform_rules
near the end of the function – where the action happens. The rest of the parse_transform
function is devoted to:
- Setting up an ets table that we saw used in previous snippets.
- Capturing the module name in the ets table, which we use in generating the
parse
function. - Cleaning up the ets table when we’re done.
Conclusion
I hope that is a bit more clear and gives you a better idea of how I’m accomplishing the transformation.