Grammar syntax

Canopy grammar definitions are written using standard PEG notation and stored in files with the .peg extension. They only specify the static grammar of the language and do not contain inline processing code. However, you can add additional methods to parse trees by extending nodes.

A grammar must have as its first line the word grammar, followed by one space and the name of the grammar. This is followed by a list of rules that define the grammar; a rule is a name, followed by <-, followed by the rule’s definition. A rule name can be any valid ASCII JavaScript variable name. The first rule in the grammar is the root; a document must match this rule to be parsed correctly.

The details of types of expressions you can use within grammar rules are covered in detail in the other articles linked on the left.

For example, here’s a simple grammar that matches any sequence of digits:

grammar Digits
  digits  <-  [0-9]*

A grammar file is converted into a JavaScript module using the canopy command-line tool:

$ canopy digits.peg

Rules can contain references to other rules; this is what allows PEG parsers to process recursive syntaxes. For example, this grammar matches a number surrounded by any number of matched parentheses - this is not possible with regular expressions.

grammar Parens
  value   <-  "(" value ")" / number
  number  <-  [0-9]+

This generates a parser that processes the language, and throws an error on invalid input:

var parens = require('./parens')

   == { text: '94',
        offset: 0,
         [ { text: '9', offset: 0, elements: [] },
           { text: '4', offset: 1, elements: [] } ] }

   == { text: '(94)',
        offset: 0,
         [ { text: '(', offset: 0, elements: [] },
           { text: '94', offset: 1, elements: [...] },
           { text: ')', offset: 3, elements: [] } ] }

Error: Line 1: expected ")"

Read more about cross-references.