Building parse trees
By default, a Canopy parser will generate a parse tree without you needing to
tell it how to do so. Every node has a text
, an offset
and a (possibly
empty) list of elements
. But you can also tell Canopy to call functions that
you define, to build the tree yourself.
Say we have a grammar that matches strings that represent a mapping from a name
to a list of numbers, like {'ints':[1,2,3]}
:
maps.peg
grammar Maps
map <- "{" string ":" value "}"
string <- "'" [^']* "'"
value <- list / number
list <- "[" value ("," value)* "]"
number <- [0-9]+
To change the kinds of values the parser generates each time it matches a rule,
we can give the names of functions to call, as names prefixed with a %
sign:
maps.peg
grammar Maps
map <- "{" string ":" value "}" %make_map
string <- "'" [^']* "'" %make_string
value <- list / number
list <- "[" value ("," value)* "]" %make_list
number <- [0-9]+ %make_number
These function names are called actions. Once you’ve compiled this parser, you can use it by passing in an object that implements the named actions. Each function is passed four arguments:
input
: the complete text of the input documentstart
: the start offset of the text that matches the ruleend
: the end offset of the text that matches the ruleelements
: an array of the values generated by the rule’s sub-rules
For example, let’s implement actions for the above parser that translate the input text into a JavaScript value representing the same structure:
const maps = require('./maps')
const actions = {
make_map (input, start, end, elements) {
let map = {}
map[elements[1]] = elements[3]
return map
},
make_string (input, start, end, elements) {
return elements[1].text
},
make_list (input, start, end, elements) {
let list = [elements[1]]
elements[2].forEach((el) => list.push(el.value))
return list
},
make_number (input, start, end, elements) {
return parseInt(input.substring(start, end), 10)
}
}
let result = maps.parse("{'ints':[1,2,3]}", { actions })
console.log(result)
This program prints
{ ints: [ 1, 2, 3 ] }
The parser calls these actions instead of building nodes itself. It passes the
(input, start, end)
arguments rather than just the text of the match, because
this lets it skip spending time and memory on creating substrings when it
doesn’t need to; notice how most of the rules above don’t use these arguments.
The %
operator binds to sequence expressions, that is, in the following
grammar, the input abc
will invoke make_alpha
while the input 123
will
invoke make_numeric
:
actions.peg
grammar Actions
root <- "a" "b" "c" %make_alpha / "1" "2" "3" %make_numeric
It can only be used with expressions that create new nodes. It cannot be used
with expressions that simply pass through a node created by another rule, such
as the ?
, /
, &
and !
operators, and cross-references. It can be used
with a sequence of two or more expressions that contains such a rule, but not
with those rules on their own.
Action functions are called as the parser is running, so they let you execute code while the input is still being processed.
Adding methods to nodes
Instead of telling the parser how to build nodes, you can have it augment the
nodes it builds by default with your own methods. This is done by annotating
parsing expressions with types. A type is any valid JavaScript object name like
Foo.Bar
surrounded with pointy brackets. When the input matches this
expression, the generated syntax node will gain the methods from the named type.
Let’s take a simple example: matching a string literal:
strings.peg
grammar Strings
root <- "hello" <HelloNode>
const strings = require('./strings')
const types = {
HelloNode: {
upcase () {
return this.text.toUpperCase()
}
}
}
let tree = strings.parse('hello', { types })
console.log(tree.upcase())
The grammar says that a node matching hello
is of type HelloNode
. Then in
our JavaScript code, we pass in an object that contains the named types via the
types
option, and use the parser to process a string.
Because the string matches our typed rule, it gains the methods from the
HelloNode
module, and we can invoke those methods on the node.
Let’s run this script:
$ node strings_test.js
HELLO
In the grammar syntax, type annotations bind to sequences. That is, a type annotation may only appear at the end of a sequence expression, and binds tighter than choice expressions. Unlike action annotations, type annotations can be used on any kind of expression, not just those that produce new nodes.
For example the following means that a node matching the sequence
"foo" "bar"
will be augmented with the Extension
methods.
words.peg
grammar Words
root <- first:"foo" second:"bar" <Extension>
The extension methods have access to the labelled node from the sequence.
const words = require('./words')
const types = {
Extension: {
convert () {
return this.first.text + this.second.text.toUpperCase()
}
}
}
words.parse('foobar', { types }).convert()
== 'fooBAR'
Because type annotations bind to sequences rather than to choices, the
following matches either the string "abc"
which gains the Foo
type, or
"123"
which gains the Bar
type:
sequences.peg
grammar Choice
root <- "a" "b" "c" <Foo> / "1" "2" "3" <Bar>
If you want all the branches of a choice to be augmented with the same type, you need to parenthesize the choice and place the type afterward.
choices.peg
grammar Choices
root <- (alpha / beta) <Extension>
alpha <- first:"a" second:"z"
beta <- first:"j" second:"c"
const choices = require('./choices')
const types = {
Extension: {
convert () {
return this.first.text + this.second.text.toUpperCase()
}
}
}
choices.parse('az', { types }).convert()
== 'aZ'
choices.parse('jc', { types }).convert()
== 'jC'