[boo]
Towards Extensible Parsing
[ bamboo ] 01:40, Friday, 9 May 2008

Almost four years ago the first feature request was entered into the boo issue tracker.

There should be a way to extend the parser so it could recognize custom measurement unit literals such as 1kg and 2cm.

Since then boo has improved a lot but with no solution for BOO-1 in the horizon.

I think I'm getting close to solve it in a interesting way using PEGs implemented as a graph of expression objects.

PEGs are very likable. Conceptually simple and composable.

Take the PEG that recognizes integer expressions involving the + and * operators:

grammar <- spaces addition eof
addition <- term ("+" spaces term)*
term <- factor ("*" spaces factor)*
factor <- [0-9]+ spaces
spaces <- (' ' / '\t')*
eof <- !.

where:

() means grouping.

* means zero or more matches.

+ means one or more matches.

/ is the prioritized choice operator. If the first expression succeeds, the whole expression succeeds. If the first expression fails, it backtracks and evaluates the second expression.

! is the not predicate operator which succeeds if its operand fails. It never consumes any input.

. matches any input.

The grammar can be translated to boo very simply using the peg macro from Boo.Pegs:

import Boo.Pegs

peg:
    grammar = spaces, addition, eof
    addition = term, --("+", spaces, term)
    term = factor, --("*", spaces, factor)
    factor = ++[0-9], spaces
    spaces = --(' ' / '\t')
    eof = not any()
    
assert grammar.Match(PegContext("  6*6 + 6 "))

I had to be a little creative in mapping the PEG operators to valid boo expressions because as it must be clear by now boo doesn't allow the introduction of completely new syntax and that's what the fuss is all about here.

I actually like the way it looks.

Implementing something more useful such as expression evaluation on top of that requires a few semantic actions operating a stack:

import Boo.Pegs
import System.Collections.Generic

stack = Stack[of int]()
push = stack.Push
pop = stack.Pop

peg:
    grammar = spaces, addition, eof
    addition = term, --("+", spaces, term, { push(pop() + pop()) })
    term = factor, --("*", spaces, factor, { push(pop() * pop()) })
    factor = ++[0-9], { push(int.Parse($text)) }, spaces
    spaces = --(' ' / '\t')
    eof = not any()
    
assert grammar.Match(PegContext("  6*6 + 6 "))
assert 42 == pop()

Semantic actions are just closures that get executed as matching succeeds. $text returns the text matched so far by the current rule.

Beautiful.

The underlying implementation based on a graph of expression objects really shines when one considers what it takes to extend the grammar above with support for hexadecimal literals:

peg:
    // rebind
    factor.Expression = hex_number / factor.Expression
    hex_number = "0x", ++hex_digit, { push(int.Parse($text[2:], NumberStyles.HexNumber)) }, spaces
    hex_digit = [0-9, a-f, A-F]
assert grammar.Match(PegContext("  0xa*2 + 11*0x02"))
assert 42 == pop()

It's not yet clear how this extensibility mechanism will be exposed at the boo language level but the simplicity at the peg level is encouraging.

One last feature worth pointing out is the ability to match based on a previously matched rule. For instance, the closing tag of a xml element must match the name in the starting tag:

import Boo.Pegs

peg:
    element = '<', tag, '>', content, '', @tag, '>'
    tag = ++(a-z)
    content = --(element / text)
    text = not "<", any()
    
assert element.Match(PegContext("<foo><bar>Hello</bar></foo>"))

I've found the idea for the last match operator @ first described in this article. Great idea.

I've been also greatly inspired by conversations I've had with Massi who's exploring similar territory and Jb during the last Mono Meeting in Barcelona Madrid and with Cedric over a beer in Paris. I think Massi will be pleased to know that I haven't given any thoughts to performance leaving all the fun to him.

Extensible parsing. Soon in a boo compiler close to you.



Comments

Fantastic stuff - great work!

--Stuart Carnie, May 9, 2008 03:07 AM

Wonderful!

--Onur Gumus, May 9, 2008 05:40 AM

Great

--网站推广, May 9, 2008 06:35 AM

Looks promising. I also love the extensive usage of the match macro in Boo.Pegs.
You keep reminding us how cool Boo can be in the right hands.

--Avish, May 9, 2008 01:42 PM

I lagged in my feeds and just stumbled upon this post!
I'm completely amazed!! So many possibilities :)


ps: wasn't Mono Summit in Madrid and not in Barcelona? ;)

--Cedric, May 13, 2008 04:07 PM

Whoa--when did PEGs become available on .NET? I only read about a Java implementation.

I've always wanted syntax extensibility in Boo. Fingers crossed...

--David Piepgrass, January 28, 2009 06:27 PM
Post a comment









Remember personal info?