Grammar syntax
Canopy grammar definitions are written using standard
PEG notation
and stored in files with the .peg
extension. They only specify the static
grammar of the language and do not contain inline processing code. However, you
can add additional methods to parse trees by implementing parsing
actions.
A grammar must have as its first line the word grammar
, followed by a space
and the name of the grammar. This is followed by a list of rules that define the
grammar; a rule is a name, followed by <-
, followed by the rule’s definition.
A rule name can be any valid ASCII JavaScript variable name. The first rule in
the grammar is the root; a document must match this rule to be parsed
correctly.
Lines starting with #
are treated as comments.
The details of types of expressions you can use within grammar rules are covered in detail in the other articles linked on the left.
For example, here’s a simple grammar that matches any sequence of digits:
digits.peg
# A grammar file to be used with Canopy:
# https://www.npmjs.com/package/canopy
#
# Explanation and syntax reference: https://canopy.jcoglan.com/
#
# To build:
#
# $ npm install -g canopy
# $ canopy digits.peg --lang javascript (or java | python | ruby)
grammar Digits
digits <- [0-9]*
A grammar file is converted into a JavaScript module using the canopy
command-line tool:
$ canopy digits.peg
Rules can contain references to other rules; this is what allows PEG parsers to process recursive syntaxes. For example, this grammar matches a number surrounded by any number of matched parentheses - this is not possible with regular expressions.
parens.peg
grammar Parens
value <- "(" value ")" / number
number <- [0-9]+
This generates a parser that processes the language, and throws an error on invalid input:
const parens = require('./parens')
parens.parse('94')
== { text: '94',
offset: 0,
elements:
[ { text: '9', offset: 0, elements: [] },
{ text: '4', offset: 1, elements: [] } ] }
parens.parse('(94)')
== { text: '(94)',
offset: 0,
elements:
[ { text: '(', offset: 0, elements: [] },
{ text: '94', offset: 1, elements: [...] },
{ text: ')', offset: 3, elements: [] } ] }
parens.parse('(((94)')
Error: Line 1: expected ")"
(((94)
^