Java
To get an overview of how to use Canopy with Java, consider this example of a simplified grammar for URLs:
url.peg
grammar URL
url <- scheme "://" host pathname search hash?
scheme <- "http" "s"?
host <- hostname port?
hostname <- segment ("." segment)*
segment <- [a-z0-9-]+
port <- ":" [0-9]+
pathname <- "/" [^ ?]*
search <- ("?" query:[^ #]*)?
hash <- "#" [^ ]*
We can compile this grammar into a Java package using canopy
:
$ canopy url.peg --lang java
This creates a package called url
that contains all the parser logic. The
package name is based on the path to the .peg
file when you run canopy
, for
example if you run:
$ canopy com/jcoglan/canopy/url.peg --lang java
then you will get a package named com.jcoglan.canopy.url
. The --output
option can be used to override this:
$ canopy com/jcoglan/canopy/url.peg --lang java --output some/dir/url
This will write the generated files into the directory some/dir/url
with the
package name some.dir.url
.
Let’s try out our parser:
import url.URL;
import url.TreeNode;
import url.ParseError;
public class Example {
public static void main(String[] args) throws ParseError {
TreeNode tree = URL.parse("http://example.com/search?q=hello#page=1");
for (TreeNode node : tree.elements) {
System.out.println(node.offset + ", " + node.text);
}
/* prints:
0, http
4, ://
7, example.com
18, /search
25, ?q=hello
33, #page=1 */
}
}
This little example shows a few important things:
You invoke the parser by calling the module’s parse()
function with a string.
The parse()
method returns a tree of nodes.
Each node has three properties:
String text
, the snippet of the input text that node representsint offset
, the number of characters into the input text the node appearsList<TreeNode> elements
, an array of nodes matching the sub-expressions
Walking the parse tree
You can use elements
to walk into the structure of the tree, or, you can use
the labels that Canopy generates, which can make your code clearer:
import url.URL;
import url.TreeNode;
import url.ParseError;
import url.Label;
public class Example {
public static void main(String[] args) throws ParseError {
TreeNode tree = URL.parse("http://example.com/search?q=hello#page=1");
System.out.println(tree.elements.get(4).elements.get(1).text);
// -> 'q=hello'
System.out.println(tree.get(Label.search).get(Label.query).text);
// -> 'q=hello'
}
}
Parsing errors
If you give the parser an input text that does not match the grammar, a
url.ParseError
is thrown. The error message will list any of the strings or
character classes the parser was expecting to find at the furthest position it
got to, along with the rule those expectations come from, and it will highlight
the line of the input where the syntax error occurs.
import url.URL;
import url.TreeNode;
import url.ParseError;
public class Example {
public static void main(String[] args) throws ParseError {
TreeNode tree = URL.parse("https://example.com./");
}
}
// url.ParseError: Line 1: expected one of:
//
// - [a-z0-9-] from URL::segment
//
// 1 | https://example.com./
// ^
Implementing actions
Say you have a grammar that uses action annotations, for example:
maps.peg
grammar Maps
map <- "{" string ":" value "}" %make_map
string <- "'" [^']* "'" %make_string
value <- list / number
list <- "[" value ("," value)* "]" %make_list
number <- [0-9]+ %make_number
In Java, compiling the above grammar produces a package called maps
that
contains classes called Maps
, TreeNode
and ParseError
, an enum called
Label
and an interface called Actions
. You supply the action functions to
the parser by implementing the Actions
interface, which has one method for
each action named in the grammar, each of which must return a TreeNode
.
TreeNode
has a no-argument constructor so making subclasses of it is
relatively easy.
The following example parses the input {'ints':[1,2,3]}
. It defines one
TreeNode
subclass for each kind of value in the tree:
Pair
wraps aMap<String, List<Integer>>
Text
wraps aString
Array
wraps aList<Integer>
Number
wraps anint
It then implements the Actions
interface to generate values of these types
from the parser matches.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import maps.Actions;
import maps.Label;
import maps.Maps;
import maps.ParseError;
import maps.TreeNode;
class Pair extends TreeNode {
Map<String, List<Integer>> pair;
Pair(String key, List<Integer> value) {
pair = new HashMap<String, List<Integer>>();
pair.put(key, value);
}
}
class Text extends TreeNode {
String string;
Text(String string) {
this.string = string;
}
}
class Array extends TreeNode {
List<Integer> list;
Array(List<Integer> list) {
this.list = list;
}
}
class Number extends TreeNode {
int number;
Number(int number) {
this.number = number;
}
}
class MapsActions implements Actions {
public Pair make_map(String input, int start, int end, List<TreeNode> elements) {
Text string = (Text)elements.get(1);
Array array = (Array)elements.get(3);
return new Pair(string.string, array.list);
}
public Text make_string(String input, int start, int end, List<TreeNode> elements) {
return new Text(elements.get(1).text);
}
public Array make_list(String input, int start, int end, List<TreeNode> elements) {
List<Integer> list = new ArrayList<Integer>();
list.add(((Number)elements.get(1)).number);
for (TreeNode el : elements.get(2)) {
Number number = (Number)el.get(Label.value);
list.add(number.number);
}
return new Array(list);
}
public Number make_number(String input, int start, int end, List<TreeNode> elements) {
return new Number(Integer.parseInt(input.substring(start, end), 10));
}
}
public class Example {
public static void main(String[] args) throws ParseError {
Pair result = (Pair)Maps.parse("{'ints':[1,2,3]}", new MapsActions());
System.out.println(result.pair);
// -> {ints=[1, 2, 3]}
}
}
Extended node types
Using the <Type>
grammar annotation is not supported in the Java version.