Canopy – Sequences

Sequences

A sequence is just what is sounds like: one or more nodes, listed one after the other, separated by at least one whitespace character. A sequence matches the input if the input contains matches for each of the sequence’s nodes, in order.

For example, here’s a grammar that matches an optional word followed by two more required words:

hamlet.peg

grammar Hamlet
  root  <-  "not "? "to be"

The sequence here is formed of two nodes: "not "? and "to be". Here’s the resulting parses of the possible inputs:

require('./hamlet').parse('to be')
   == { text: 'to be',
        offset: 0,
        elements: [
          { text: '', offset: 0, elements: [] },
          { text: 'to be', offset: 0, elements: [] }
        ] }

require('./hamlet').parse('not to be')
   == { text: 'not to be',
        offset: 0,
        elements: [
          { text: 'not ', offset: 0, elements: [] },
          { text: 'to be', offset: 4, elements: [] }
        ] }

require('./hamlet').parse('or not to be')
SyntaxError: Line 1: expected one of:

    - "not " from Hamlet::root
    - "to be" from Hamlet::root

     1 | or not to be
         ^

Labelled nodes

Sequences have a special property: their child nodes can be labelled. You can explicitly add a label to any item within a sequence, and cross-references are implicitly labelled with the name of the reference. For example, take the following example that matches documents that look like {'abc' => 123}:

hash.peg

grammar Hash
  object  <-  "{" string " => " number:[0-9]+ "}"
  string  <-  "'" [^']* "'"

The object rule is a sequence containing five children:

"{"
string
" => "
number:[0-9]+
"}"

The string node is a reference to another rule, and number:[0-9]+ is a labelled expression that matches one or more digits. These two children create labelled nodes in the output:

let tree = require('./hash').parse("{'foo' => 36}")

   == { text: "{'foo' => 36}",
        offset: 0,
        elements: [
          { text: '{', offset: 0, elements: [] },
          { text: "'foo'", offset: 1, elements: [...] },
          { text: ' => ', offset: 6, elements: [] },
          { text: '36', offset: 10, elements: [...] },
          { text: '}', offset: 12, elements: [] }
        ],
        string: { text: "'foo'", offset: 1, elements: [...] },
        number: { text: '36', offset: 10, elements: [...] } }

tree.string.text
   == "'foo'"

tree.number.text
   == "36"

Here we see that tree.string is the same as tree.elements[1], and tree.number is the same as tree.elements[3]. These labels make it much easier to navigate the tree, and reduce the amount you need to change your code if elements are added or removed from a sequence.

Muting

To make it more convenient to work with parse trees, we can label expressions in the grammar as muted, meaning they won’t generate nodes in the output. For example, in the grammar above, the "{", " => " and "}" items don’t contain meaningful information and we can mute them by placing a @ symbol before them.

hash.peg

grammar Hash
  object  <-  @"{" string @" => " number:[0-9]+ @"}"
  string  <-  "'" [^']* "'"

Now, the nodes for those elements will be excluded from the parse tree.

require('./hash').parse("{'foo' => 36}")

   == { text: "{'foo' => 36}",
        offset: 0,
        elements: [
          { text: "'foo'", offset: 1, elements: [...] },
          { text: '36', offset: 10, elements: [...] }
        ],
        string: { text: "'foo'", offset: 1, elements: [...] },
        number: { text: '36', offset: 10, elements: [...] } }

Muted items are also excluded from the elements array that is passed to tree-building functions.