Parsing expression grammar

Python implementation

PEG Parser

Implementation of PEG parser as described in paper “Parsing Expression Grammars: A Recognition-Based Syntactic Foundation” (Bryan Ford, 2004). PEG’s grammar itself can be defined by PEG grammar:

# Hierarchical syntax
Grammar    <- Spacing Definition+ EndOfFile
Definition <- Identifier LEFTARROW Expression
Expression <- Sequence (SLASH Sequence)*
Sequence   <- Prefix*
Prefix     <- (AND / NOT)? Suffix
Suffix     <- Primary (QUESTION / STAR / PLUS)?
Primary    <- Identifier !LEFTARROW
            / OPEN Expression CLOSE
            / Literal
            / Class
            / DOT

# Lexical syntax
Identifier <- IdentStart IdentCont* Spacing
IdentStart <- [a-zA-Z_]
IdentCont  <- IdentStart / [0-9]
Literal    <- ['] (!['] Char)* ['] Spacing
            / ["] (!["] Char)* ["] Spacing
Class      <- '[' (!']' Range)* ']' Spacing
Range      <- Char '-' Char / Char
Char       <- '\\' [nrt'"\[\]\\]
            / '\\' 'x' Hex Hex
            / !'\\' .
Hex        <- [0-9a-fA-F]

LEFTARROW  <- '<-' Spacing
SLASH      <- '/' Spacing
AND        <- '&' Spacing
NOT        <- '!' Spacing
QUESTION   <- '?' Spacing
STAR       <- '*' Spacing
PLUS       <- '+' Spacing
OPEN       <- '(' Spacing
CLOSE      <- ')' Spacing
DOT        <- '.' Spacing

Spacing    <- (Space / Comment)*
Comment    <- '#' (!EndOfLine .)* EndOfLine
Space      <- ' ' / '\t' / EndOfLine
EndOfLine  <- '\r\n' / '\n' / '\r'
EndOfFile  <- !.

Example usage of PEG parser:

import functools
import hat.peg

grammar = hat.peg.Grammar(r'''
    Expr    <- Sum
    Sum     <- Product (('+' / '-') Product)*
    Product <- Value (('*' / '/') Value)*
    Value   <- Spacing ([0-9]+ / '(' Expr ')') Spacing
    Spacing <- ' '*
''', 'Expr')

ast = grammar.parse('1 + 2 * 3 + (4 - 5) * 6 + 7')

result = hat.peg.walk_ast(ast, {
    'Expr': lambda n, c: c[0],
    'Sum': lambda n, c: functools.reduce(
        (lambda acc, x: acc + x[1] if x[0] == '+' else acc - x[1]),
        zip(c[1::2], c[2::2]),
        c[0]),
    'Product': lambda n, c: functools.reduce(
        (lambda acc, x: acc * x[1] if x[0] == '*' else acc / x[1]),
        zip(c[1::2], c[2::2]),
        c[0]),
    'Value': lambda n, c: (c[2] if c[1] == '(' else
                           int(''.join(c[1:-1])))})

assert result == 8
hat.peg.Data

Data

alias of Union[str, bytes]

class hat.peg.Node(name, value)

Bases: tuple

Abstract syntax tree node.

Node names are identifiers of parser’s definitions and values are other nodes or Data representing matched Literal, Class or Dot leafs.

Create new instance of Node(name, value)

name

node name (production definition identifier)

Type

str

value

subnodes or leafs

Type

List[Union[Node,Data]]

hat.peg.walk_ast(node, actions, default_action=None)

Simple depth-first abstract syntax tree parser.

Actions are key value pairs where keys represent node names and values are callables that should be called for appropriate node. Each callable receives matched node and list of results from recursively applying this function on child nodes. For nodes which name doesn’t match any action, default action is used. If default action is not defined and node name doesn’t match action, result is None and recursion is stopped.

Parameters
  • node (hat.peg.Node) –

  • actions (Dict[str, Callable[[hat.peg.Node, List], Any]]) –

  • default_action (Optional[Callable[[hat.peg.Node, List], Any]]) –

Return type

Any

class hat.peg.Sequence(expressions)

Bases: tuple

Create new instance of Sequence(expressions,)

expressions

List[Expression]

class hat.peg.Choice(expressions)

Bases: tuple

Create new instance of Choice(expressions,)

expressions

List[Expression]

class hat.peg.Not(expression)

Bases: tuple

Create new instance of Not(expression,)

expression

Expression

class hat.peg.And(expression)

Bases: tuple

Create new instance of And(expression,)

expression

Expression

class hat.peg.OneOrMore(expression)

Bases: tuple

Create new instance of OneOrMore(expression,)

expression

Expression

class hat.peg.ZeroOrMore(expression)

Bases: tuple

Create new instance of ZeroOrMore(expression,)

expression

Expression

class hat.peg.Optional(expression)

Bases: tuple

Create new instance of Optional(expression,)

expression

Expression

class hat.peg.Identifier(value)

Bases: tuple

Create new instance of Identifier(value,)

value

str

class hat.peg.Literal(value)

Bases: tuple

Create new instance of Literal(value,)

value

bytes

class hat.peg.Class(values)

Bases: tuple

Create new instance of Class(values,)

values

List[Union[int,Tuple[int,int]]]

class hat.peg.Dot

Bases: tuple

Create new instance of Dot()

class hat.peg.MatchResult(node, rest)

Bases: tuple

Create new instance of MatchResult(node, rest)

node

None if match failed

Type

Optional[Node]

rest

remaining input data

Type

Data

class hat.peg.MatchCallFrame(name, data)

Bases: tuple

Create new instance of MatchCallFrame(name, data)

data

input data

Type

Data

name

definition name

Type

str

class hat.peg.Grammar(definitions, starting)

Bases: object

PEG Grammar.

Parameters
  • definitions (Union[str, Dict[str, Union[hat.peg.Sequence, hat.peg.Choice, hat.peg.Not, hat.peg.And, hat.peg.OneOrMore, hat.peg.ZeroOrMore, hat.peg.Optional, hat.peg.Identifier, hat.peg.Literal, hat.peg.Class, hat.peg.Dot]]]) – grammar definitions

  • starting (str) – starting definition name

property definitions

Definitions

property starting

Starting definition name

parse(data, debug_cb=None)

Parse input data.

debug_cb is optional function which can be used for monitoring and debugging parse steps. It is called each time named definition is successfully or unsuccessfully matched. This function receives match result and match call stack.

Parameters
  • data (Union[str, bytes]) –

  • debug_cb (Optional[Callable[[hat.peg.MatchResult, Iterable[hat.peg.MatchCallFrame]], None]]) –

Return type

hat.peg.Node

hat.peg.console_debug_cb(result, call_stack)

Simple console debugger.

Parameters
  • result (hat.peg.MatchResult) –

  • call_stack (Iterable[hat.peg.MatchCallFrame]) –