corinthia-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pmke...@apache.org
Subject incubator-corinthia git commit: Flat: Data structure for built-in PEG grammar
Date Wed, 08 Jul 2015 16:11:38 GMT
Repository: incubator-corinthia
Updated Branches:
  refs/heads/master 377421ecf -> e0b656433


Flat: Data structure for built-in PEG grammar

Flat (Formal LAanguage Transformation) is an experimental programming
language for parsing, transforming, and pretty-printing any data format
that can be expressed as a Parsing Expression Grammar (PEG) [1].

Its purpose is to provide an easy way of dealing with file formats like
Markdown, LaTeX, and org-mode which use a custom grammar, rather than a
standardized meta-format like XML or JSON.

Initially, the intention is to simply provide a parser interpreter
(similar to a parser generator of the bison/flex variety) such that we
can get an abstract syntax tree from a file, from which further
processing can be done. The plan is to later extend it to include a
domain-specific language for expressing transformations between
different grammars, similar to Stratego/XT. [2]

To understand PEGs, read the following papers:

"Parsing Expression Grammars: A Recognition-Based Syntactic Foundation".
Bryan Ford, POPL, January 2004. http://bford.info/pub/lang/peg.pdf

"Packrat Parsing: Simple, Powerful, Lazy, Linear Time". Bryan Ford.
ICFP, October 2002. http://bford.info/pub/lang/packrat-icfp02.pdf

[1] http://bford.info/packrat

[2] http://strategoxt.org


Project: http://git-wip-us.apache.org/repos/asf/incubator-corinthia/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-corinthia/commit/e0b65643
Tree: http://git-wip-us.apache.org/repos/asf/incubator-corinthia/tree/e0b65643
Diff: http://git-wip-us.apache.org/repos/asf/incubator-corinthia/diff/e0b65643

Branch: refs/heads/master
Commit: e0b6564333727970799059ffc3038cf6e66f7757
Parents: 377421e
Author: Peter Kelly <peter@uxproductivity.com>
Authored: Wed Jul 8 23:10:03 2015 +0700
Committer: Peter Kelly <peter@uxproductivity.com>
Committed: Wed Jul 8 23:10:03 2015 +0700

----------------------------------------------------------------------
 CMakeLists.txt                      |   1 +
 experiments/flat/README.txt         |  25 +++
 experiments/flat/grammars/peg.peg   |  43 +++++
 experiments/flat/src/Builtin.c      | 226 ++++++++++++++++++++++++++
 experiments/flat/src/Builtin.h      |  70 ++++++++
 experiments/flat/src/CMakeLists.txt |  50 ++++++
 experiments/flat/src/Expression.c   | 266 +++++++++++++++++++++++++++++++
 experiments/flat/src/Expression.h   | 102 ++++++++++++
 experiments/flat/src/Grammar.c      | 100 ++++++++++++
 experiments/flat/src/Grammar.h      |  28 ++++
 experiments/flat/src/flat.c         |  28 ++++
 11 files changed, 939 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/e0b65643/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 4959dd9..a8ef086 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -63,3 +63,4 @@ add_subdirectory(consumers/dftest/src)
 add_subdirectory(consumers/dfconvert/src)
 add_subdirectory(consumers/dfutil/src)
 add_subdirectory(consumers/corinthia/src)
+add_subdirectory(experiments/flat/src)

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/e0b65643/experiments/flat/README.txt
----------------------------------------------------------------------
diff --git a/experiments/flat/README.txt b/experiments/flat/README.txt
new file mode 100644
index 0000000..436041a
--- /dev/null
+++ b/experiments/flat/README.txt
@@ -0,0 +1,25 @@
+Flat (Formal LAanguage Transformation) is an experimental programming language
+for parsing, transforming, and pretty-printing any data format that can be
+expressed as a Parsing Expression Grammar (PEG) [1].
+
+Its purpose is to provide an easy way of dealing with file formats like
+Markdown, LaTeX, and org-mode which use a custom grammar, rather than a
+standardized meta-format like XML or JSON.
+
+Initially, the intention is to simply provide a parser interpreter (similar to a
+parser generator of the bison/flex variety) such that we can get an abstract
+syntax tree from a file, from which further processing can be done. The plan is
+to later extend it to include a domain-specific language for expressing
+transformations between different grammars, similar to Stratego/XT. [2]
+
+To understand PEGs, read the following papers:
+
+"Parsing Expression Grammars: A Recognition-Based Syntactic Foundation". Bryan
+Ford, POPL, January 2004. http://bford.info/pub/lang/peg.pdf
+
+"Packrat Parsing: Simple, Powerful, Lazy, Linear Time". Bryan Ford. ICFP,
+October 2002. http://bford.info/pub/lang/packrat-icfp02.pdf
+
+[1] http://bford.info/packrat
+
+[2] http://strategoxt.org

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/e0b65643/experiments/flat/grammars/peg.peg
----------------------------------------------------------------------
diff --git a/experiments/flat/grammars/peg.peg b/experiments/flat/grammars/peg.peg
new file mode 100644
index 0000000..9de9394
--- /dev/null
+++ b/experiments/flat/grammars/peg.peg
@@ -0,0 +1,43 @@
+Grammar    <- Spacing Definition+ EndOfFile
+Definition <- Identifier LEFTARROW Expression
+Expression <- Sequence (SLASH Sequence)*
+Sequence   <- Prefix*
+Prefix     <- (AND / NOT)? Suffix
+Suffix     <- Primary (QUESTION / STAR / PLUS)?
+Primary    <- Identifier !LEFTARROW
+            / OPEN Expression CLOSE
+            / Literal
+            / Class
+            / DOT
+Identifier <- IdentStart IdentCont* Spacing
+IdentStart <- [a-zA-Z_]
+IdentCont  <- IdentStart
+            / [0-9]
+Literal    <- ['] (!['] Char)* ['] Spacing
+            / ["] (!["] Char)* ["] Spacing
+Class      <- '[' (!']' Range)* ']' Spacing
+Range      <- Char '-' Char
+            / Char
+Char       <- '\\' [nrt'"\[\]\\]
+            / '\\' [0-2] [0-7] [0-7]
+            / '\\' [0-7] [0-7]?
+            / !'\\' .
+LEFTARROW  <- '<-' Spacing
+SLASH      <- '/' Spacing
+AND        <- '&' Spacing
+NOT        <- '!' Spacing
+QUESTION   <- '?' Spacing
+STAR       <- '*' Spacing
+PLUS       <- '+' Spacing
+OPEN       <- '(' Spacing
+CLOSE      <- ')' Spacing
+DOT        <- '.' Spacing
+Spacing    <- (Space / Comment)*
+Comment    <- '#' (!EndOfLine .)* EndOfLine
+Space      <- ' '
+            / '\t'
+            / EndOfLine
+EndOfLine  <- '\r\n'
+            / '\n'
+            / '\r'
+EndOfFile  <- !.

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/e0b65643/experiments/flat/src/Builtin.c
----------------------------------------------------------------------
diff --git a/experiments/flat/src/Builtin.c b/experiments/flat/src/Builtin.c
new file mode 100644
index 0000000..3bcc475
--- /dev/null
+++ b/experiments/flat/src/Builtin.c
@@ -0,0 +1,226 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "Builtin.h"
+#include <stdarg.h>
+#include <stdlib.h>
+
+#define ref(name)       ExpressionNewValue(IdentExpr,name)
+#define seq(...)        makeExpression(SequenceExpr,__VA_ARGS__,NULL)
+#define choice(...)     makeExpression(ChoiceExpr,__VA_ARGS__,NULL)
+#define and(...)        makeExpression(AndExpr,__VA_ARGS__,NULL)
+#define not(...)        makeExpression(NotExpr,__VA_ARGS__,NULL)
+#define opt(...)        makeExpression(OptExpr,__VA_ARGS__,NULL)
+#define star(...)       makeExpression(StarExpr,__VA_ARGS__,NULL)
+#define plus(...)       makeExpression(PlusExpr,__VA_ARGS__,NULL)
+#define lit(value)      ExpressionNewValue(LitExpr,value)
+#define cls(...)        makeExpression(ClassExpr,__VA_ARGS__,NULL)
+#define dot()           makeExpression(DotExpr,NULL)
+#define range(lo,hi)    ExpressionNewRange(lo,hi)
+
+static Expression *makeExpression(ExprKind kind, ...)
+{
+    Expression *children[20];
+    int count = 0;
+    va_list ap;
+    va_start(ap,kind);
+    Expression *expr;
+    while ((count < 20) && ((expr = va_arg(ap,Expression *)) != NULL))
+        children[count++] = expr;
+    va_end(ap);
+    return ExpressionNew(kind,count,children);
+}
+
+Grammar *GrammarNewBuiltin(void)
+{
+    Grammar *gram = GrammarNew();
+
+    // Grammar    <- Spacing Definition+ EndOfFile
+    GrammarDefine(gram,"Grammar",
+                    seq(ref("Spacing"),
+                        plus(ref("Definition")),
+                        ref("EndOfFile")));
+
+    // Definition <- Identifier LEFTARROW Expression
+    GrammarDefine(gram,"Definition",
+                    seq(ref("Identifier"),
+                        ref("LEFTARROW"),
+                        ref("Expression")));
+    // Expression <- Sequence (SLASH Sequence)*
+    GrammarDefine(gram,"Expression",
+                    seq(ref("Sequence"),
+                        star(seq(ref("SLASH"),ref("Sequence")))));
+
+    // Sequence   <- Prefix*
+    GrammarDefine(gram,"Sequence",
+                    star(ref("Prefix")));
+
+    // Prefix     <- (AND / NOT)? Suffix
+    GrammarDefine(gram,"Prefix",
+                    seq(opt(choice(ref("AND"),
+                                   ref("NOT"))),
+                        ref("Suffix")));
+
+    // Suffix     <- Primary (QUESTION / STAR / PLUS)?
+    GrammarDefine(gram,"Suffix",
+                    seq(ref("Primary"),
+                        opt(choice(ref("QUESTION"),
+                                   ref("STAR"),
+                                   ref("PLUS")))));
+
+    // Primary    <- Identifier !LEFTARROW
+    //             / OPEN Expression CLOSE
+    //             / Literal
+    //             / Class
+    //             / DOT
+    GrammarDefine(gram,"Primary",
+                    choice(seq(ref("Identifier"),not(ref("LEFTARROW"))),
+                           seq(ref("OPEN"),ref("Expression"),ref("CLOSE")),
+                           ref("Literal"),
+                           ref("Class"),
+                           ref("DOT")));
+
+    // Identifier <- IdentStart IdentCont* Spacing
+    GrammarDefine(gram,"Identifier",
+                    seq(ref("IdentStart"),
+                        star(ref("IdentCont")),
+                        ref("Spacing")));
+
+    // IdentStart <- [a-zA-Z_]
+    GrammarDefine(gram,"IdentStart",
+                    cls(range('a','z'),
+                        range('A','Z'),
+                        range('_','_')));
+
+    // IdentCont  <- IdentStart
+    //             / [0-9]
+    GrammarDefine(gram,"IdentCont",
+                    choice(ref("IdentStart"),
+                        cls(range('0','9'))));
+
+    // Literal    <- ['] (!['] Char)* ['] Spacing
+    //             / ["] (!["] Char)* ["] Spacing
+    GrammarDefine(gram,"Literal",
+                    choice(seq(cls(range('\'','\'')),
+                               star(seq(not(cls(range('\'','\''))),
+                                        ref("Char"))),
+                               cls(range('\'','\'')),
+                               ref("Spacing")),
+                           seq(cls(range('"','"')),
+                               star(seq(not(cls(range('"','"'))),
+                                        ref("Char"))),
+                               cls(range('"','"')),
+                               ref("Spacing"))));
+
+    // Class      <- '[' (!']' Range)* ']' Spacing
+    GrammarDefine(gram,"Class",
+                    seq(lit("["),
+                        star(seq(not(lit("]")),
+                                 ref("Range"))),
+                        lit("]"),
+                        ref("Spacing")));
+    // Range      <- Char '-' Char
+    //             / Char
+    GrammarDefine(gram,"Range",
+                    choice(seq(ref("Char"),
+                               lit("-"),
+                               ref("Char")),
+                           ref("Char")));
+
+    // Char       <- '\\' [nrt'"\[\]\\]
+    //             / '\\' [0-2] [0-7] [0-7]
+    //             / '\\' [0-7] [0-7]?
+    //             / !'\\' .
+    GrammarDefine(gram,"Char",
+                    choice(seq(lit("\\"),
+                               cls(range('n','n'),
+                                   range('r','r'),
+                                   range('t','t'),
+                                   range('\'','\''),
+                                   range('"','"'),
+                                   range('[','['),
+                                   range(']',']'),
+                                   range('\\','\\'))),
+                           seq(lit("\\"),
+                               cls(range('0','2')),
+                               cls(range('0','7')),
+                               cls(range('0','7'))),
+                           seq(lit("\\"),
+                               cls(range('0','7')),
+                               opt(cls(range('0','7')))),
+                           seq(not(lit("\\")),dot())));
+
+    // LEFTARROW  <- '<-' Spacing
+    GrammarDefine(gram,"LEFTARROW",seq(lit("<-"),ref("Spacing")));
+
+    // SLASH      <- '/' Spacing
+    GrammarDefine(gram,"SLASH",seq(lit("/"),ref("Spacing")));
+
+    // AND        <- '&' Spacing
+    GrammarDefine(gram,"AND",seq(lit("&"),ref("Spacing")));
+
+    // NOT        <- '!' Spacing
+    GrammarDefine(gram,"NOT",seq(lit("!"),ref("Spacing")));
+
+    // QUESTION   <- '?' Spacing
+    GrammarDefine(gram,"QUESTION",seq(lit("?"),ref("Spacing")));
+
+    // STAR       <- '*' Spacing
+    GrammarDefine(gram,"STAR",seq(lit("*"),ref("Spacing")));
+
+    // PLUS       <- '+' Spacing
+    GrammarDefine(gram,"PLUS",seq(lit("+"),ref("Spacing")));
+
+    // OPEN       <- '(' Spacing
+    GrammarDefine(gram,"OPEN",seq(lit("("),ref("Spacing")));
+
+    // CLOSE      <- ')' Spacing
+    GrammarDefine(gram,"CLOSE",seq(lit(")"),ref("Spacing")));
+
+    // DOT        <- '.' Spacing
+    GrammarDefine(gram,"DOT",seq(lit("."),ref("Spacing")));
+
+    // Spacing    <- (Space / Comment)*
+    GrammarDefine(gram,"Spacing",star(choice(ref("Space"),ref("Comment"))));
+
+    // Comment    <- '#' (!EndOfLine .)* EndOfLine
+    GrammarDefine(gram,"Comment",
+                    seq(lit("#"),
+                        star(seq(not(ref("EndOfLine")),
+                                 dot())),
+                        ref("EndOfLine")));
+
+    // Space      <- ' '
+    //            / '\t'
+    //            / EndOfLine
+    GrammarDefine(gram,"Space",
+                    choice(lit(" "),
+                           lit("\t"),
+                           ref("EndOfLine")));
+    // EndOfLine  <- '\r\n'
+    //             / '\n'
+    //             / '\r'
+    GrammarDefine(gram,"EndOfLine",
+                    choice(lit("\r\n"),
+                           lit("\n"),
+                           lit("\r")));
+
+    // EndOfFile  <- !.
+    GrammarDefine(gram,"EndOfFile",not(dot()));
+
+    return gram;
+}

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/e0b65643/experiments/flat/src/Builtin.h
----------------------------------------------------------------------
diff --git a/experiments/flat/src/Builtin.h b/experiments/flat/src/Builtin.h
new file mode 100644
index 0000000..34ea272
--- /dev/null
+++ b/experiments/flat/src/Builtin.h
@@ -0,0 +1,70 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "Grammar.h"
+
+/**
+ * The GrammarNewBuiltin() function creates the following Grammar, which comes
+ * from Ford's original PEG paper:
+ *
+ *     Grammar    <- Spacing Definition+ EndOfFile
+ *     Definition <- Identifier LEFTARROW Expression
+ *     Expression <- Sequence (SLASH Sequence)*
+ *     Sequence   <- Prefix*
+ *     Prefix     <- (AND / NOT)? Suffix
+ *     Suffix     <- Primary (QUESTION / STAR / PLUS)?
+ *     Primary    <- Identifier !LEFTARROW
+ *                 / OPEN Expression CLOSE
+ *                 / Literal
+ *                 / Class
+ *                 / DOT
+ *     Identifier <- IdentStart IdentCont* Spacing
+ *     IdentStart <- [a-zA-Z_]
+ *     IdentCont  <- IdentStart
+ *                 / [0-9]
+ *     Literal    <- ['] (!['] Char)* ['] Spacing
+ *                 / ["] (!["] Char)* ["] Spacing
+ *     Class      <- '[' (!']' Range)* ']' Spacing
+ *     Range      <- Char '-' Char
+ *                 / Char
+ *     Char       <- '\\' [nrt'"\[\]\\]
+ *                 / '\\' [0-2] [0-7] [0-7]
+ *                 / '\\' [0-7] [0-7]?
+ *                 / !'\\' .
+ *     LEFTARROW  <- '<-' Spacing
+ *     SLASH      <- '/' Spacing
+ *     AND        <- '&' Spacing
+ *     NOT        <- '!' Spacing
+ *     QUESTION   <- '?' Spacing
+ *     STAR       <- '*' Spacing
+ *     PLUS       <- '+' Spacing
+ *     OPEN       <- '(' Spacing
+ *     CLOSE      <- ')' Spacing
+ *     DOT        <- '.' Spacing
+ *     Spacing    <- (Space / Comment)*
+ *     Comment    <- '#' (!EndOfLine .)* EndOfLine
+ *     Space      <- ' '
+ *                 / '\t'
+ *                 / EndOfLine
+ *     EndOfLine  <- '\r\n'
+ *                 / '\n'
+ *                 / '\r'
+ *     EndOfFile  <- !.
+ */
+Grammar *GrammarNewBuiltin(void);

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/e0b65643/experiments/flat/src/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/experiments/flat/src/CMakeLists.txt b/experiments/flat/src/CMakeLists.txt
new file mode 100644
index 0000000..a8e2877
--- /dev/null
+++ b/experiments/flat/src/CMakeLists.txt
@@ -0,0 +1,50 @@
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+
+###
+## global definitions
+###
+set_property(GLOBAL PROPERTY USE_FOLDERS ON)
+
+
+###
+## group source objects
+###
+set(SOURCES
+    Builtin.c
+    Builtin.h
+    Expression.c
+    Expression.h
+    Grammar.c
+    Grammar.h
+    flat.c)
+
+###
+# Common include for all platform files
+###
+include_directories()
+include_directories(SYSTEM ${INCLUDE_DIRS})
+include_directories(.)
+link_directories(${LIB_DIRS})
+
+
+
+###
+# executable (release artifact)
+###
+add_executable(flat ${SOURCES})
+#target_link_libraries(flat DocFormats ${LIBS})
+source_group(src FILES ${SOURCES})
+set_property(TARGET flat PROPERTY FOLDER experiments)

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/e0b65643/experiments/flat/src/Expression.c
----------------------------------------------------------------------
diff --git a/experiments/flat/src/Expression.c b/experiments/flat/src/Expression.c
new file mode 100644
index 0000000..7bd7368
--- /dev/null
+++ b/experiments/flat/src/Expression.c
@@ -0,0 +1,266 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "Expression.h"
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+#include <assert.h>
+
+struct Expression {
+    ExprKind kind;
+    int count;
+    char *value;
+    int start;
+    int end;
+    Expression *children[];
+};
+
+Expression *ExpressionNew(ExprKind kind, int count, Expression **children)
+{
+    Expression *expr = (Expression *)calloc(1,sizeof(Expression)+count*sizeof(Expression
*));
+    expr->kind = kind;
+    expr->value = NULL;
+    expr->count = count;
+    memcpy(expr->children,children,count*sizeof(Expression *));
+    return expr;
+}
+
+Expression *ExpressionNewValue(ExprKind kind, const char *value)
+{
+    Expression *expr = (Expression *)calloc(1,sizeof(Expression));
+    expr->kind = kind;
+    expr->value = strdup(value);
+    expr->count = 0;
+    return expr;
+}
+
+Expression *ExpressionNewRange(int lo, int hi)
+{
+    Expression *expr = (Expression *)calloc(1,sizeof(Expression));
+    expr->kind = RangeExpr;
+    expr->start = lo;
+    expr->end = hi+1;
+    return expr;
+}
+
+void ExpressionFree(Expression *expr)
+{
+    free(expr->value);
+    for (int i = 0; i < expr->count; i++)
+        ExpressionFree(expr->children[i]);
+    free(expr);
+}
+
+ExprKind ExpressionKind(Expression *expr)
+{
+    return expr->kind;
+}
+
+Expression *ExpressionChild(Expression *expr, int index)
+{
+    if ((index < 0) || (index >= expr->count))
+        return NULL;
+    return expr->children[index];
+}
+
+int ExpressionCount(Expression *expr)
+{
+    return expr->count;
+}
+
+const char *ExpressionValue(Expression *expr)
+{
+    return expr->value;
+}
+
+int ExpressionStart(Expression *expr)
+{
+    return expr->start;
+}
+
+int ExpressionEnd(Expression *expr)
+{
+    return expr->end;
+}
+
+static int ExprKindPrecedence(ExprKind kind)
+{
+    switch (kind) {
+        case ChoiceExpr:
+            return 1;
+        case SequenceExpr:
+            return 2;
+        case AndExpr:
+        case NotExpr:
+            return 3;
+        case OptExpr:
+        case StarExpr:
+        case PlusExpr:
+            return 4;
+        case IdentExpr:
+        case LitExpr:
+        case ClassExpr:
+        case DotExpr:
+            return 5;
+        case RangeExpr:
+            return 6;
+    }
+    return 0;
+}
+
+static void printEscapedRangeChar(char c)
+{
+    switch (c) {
+        case '[':
+            printf("\\[");
+            break;
+        case ']':
+            printf("\\]");
+            break;
+        case '\\':
+            printf("\\\\");
+            break;
+        default:
+            printf("%c",c);
+            break;
+    }
+}
+
+static void printLiteral(const char *value)
+{
+    printf("'");
+    for (int i = 0; value[i] != '\0'; i++) {
+        switch (value[i]) {
+            case '\r':
+                printf("\\r");
+                break;
+            case '\n':
+                printf("\\n");
+                break;
+            case '\t':
+                printf("\\t");
+                break;
+            case '\'':
+                printf("\\'");
+                break;
+            case '\\':
+                printf("\\\\");
+                break;
+            default:
+                printf("%c",value[i]);
+        }
+    }
+    printf("'");
+}
+
+void ExpressionPrint(Expression *expr, int highestPrecedence, const char *indent)
+{
+    int exprPrecedence = ExprKindPrecedence(expr->kind);
+    int brackets = (highestPrecedence > exprPrecedence); // e.g. a choice inside a sequence
+    if (highestPrecedence < exprPrecedence)
+        highestPrecedence = exprPrecedence;
+
+    if ((expr->kind == ClassExpr) || (expr->kind == RangeExpr))
+        brackets = 0;
+
+    if (brackets) {
+        printf("(");
+        highestPrecedence = 1;
+    }
+    switch (expr->kind) {
+        case ChoiceExpr:
+            for (int i = 0; i < ExpressionCount(expr); i++) {
+                if (i > 0) {
+                    if (indent != NULL)
+                        printf("\n%s/ ",indent);
+                    else
+                        printf(" / ");
+                }
+                ExpressionPrint(ExpressionChild(expr,i),highestPrecedence,NULL);
+            }
+            break;
+        case SequenceExpr: {
+            for (int i = 0; i < ExpressionCount(expr); i++) {
+                if (i > 0) {
+                    printf(" ");
+                }
+                ExpressionPrint(ExpressionChild(expr,i),highestPrecedence,NULL);
+            }
+            break;
+        }
+        case AndExpr:
+            assert(ExpressionCount(expr) == 1);
+            printf("&");
+            ExpressionPrint(ExpressionChild(expr,0),highestPrecedence,NULL);
+            break;
+        case NotExpr:
+            assert(ExpressionCount(expr) == 1);
+            printf("!");
+            ExpressionPrint(ExpressionChild(expr,0),highestPrecedence,NULL);
+            break;
+        case OptExpr:
+            assert(ExpressionCount(expr) == 1);
+            ExpressionPrint(ExpressionChild(expr,0),highestPrecedence,NULL);
+            printf("?");
+            break;
+        case StarExpr:
+            assert(ExpressionCount(expr) == 1);
+            ExpressionPrint(ExpressionChild(expr,0),highestPrecedence,NULL);
+            printf("*");
+            break;
+        case PlusExpr:
+            assert(ExpressionCount(expr) == 1);
+            ExpressionPrint(ExpressionChild(expr,0),highestPrecedence,NULL);
+            printf("+");
+            break;
+        case IdentExpr:
+            assert(ExpressionValue(expr) != NULL);
+            printf("%s",ExpressionValue(expr));
+            break;
+        case LitExpr:
+            assert(ExpressionValue(expr) != NULL);
+            printLiteral(ExpressionValue(expr));
+            break;
+        case ClassExpr:
+            printf("[");
+            for (int i = 0; i < ExpressionCount(expr); i++) {
+                assert(ExpressionKind(ExpressionChild(expr,i)) == RangeExpr);
+                ExpressionPrint(ExpressionChild(expr,i),highestPrecedence,NULL);
+            }
+            printf("]");
+            break;
+        case DotExpr:
+            printf(".");
+            break;
+        case RangeExpr: {
+            int start = ExpressionStart(expr);
+            int end = ExpressionEnd(expr);
+            if (start+1 == end) {
+                printEscapedRangeChar(start);
+            }
+            else {
+                printEscapedRangeChar(start);
+                printf("-");
+                printEscapedRangeChar(end-1);
+            }
+            break;
+        }
+    }
+    if (brackets)
+        printf(")");
+}

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/e0b65643/experiments/flat/src/Expression.h
----------------------------------------------------------------------
diff --git a/experiments/flat/src/Expression.h b/experiments/flat/src/Expression.h
new file mode 100644
index 0000000..a6b1306
--- /dev/null
+++ b/experiments/flat/src/Expression.h
@@ -0,0 +1,102 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+/**
+ * A tree of Expression objects represents the abstract syntax tree that comprises part of
a
+ * PEG (Parsing Expression Grammar). Each grammar consists of one or more named rules, each
+ * of which has an expression tree associated with it. The format of the grammars is described
+ * in the following paper:
+ *
+ *     "Parsing Expression Grammars: A Recognition-Based Syntactic Foundation".
+ *     Bryan Ford, POPL, January 2004. http://bford.info/pub/lang/peg.pdf
+ *
+ * An Expression can be used by a code generator to produce the code for a parser, or alternatively,
+ * used directly to implement an interpreter to produce a parse tree based on the rules of
the
+ * grammar.
+ *
+ * The types are as follows:
+ *
+ * ChoiceExpr - matches exactly one of the child expressions against the input. Each expression
+ * is tested in order, until a match is found, with the result being that of the child parse.
+ * Evaluation succeeds if one of the children matched.
+ *
+ * SequenceExpr - matches each of the child expressions, in order, following consecutively
in the
+ * input. Evaluation succeeds if all children matched.
+ *
+ * AndExpr - Positive lookahead assertion. The single child expression is tested against
the input,
+ * and evaluation succeeds if there is a match. Afterwards, the current position moved back
to the
+ * location in the input where the lookahead test began.
+ *
+ * NotExpr - Negative lookahead assertion. The single child expression is tested against
the input,
+ * and evaluation succeeds if the child *fails* to match. Afterwards, the current position
moved
+ * back to the location in the input where the lookahead test began.
+ *
+ * OptExpr - The input is consumed if it matches, otherwise the position is left unchanged.
In
+ * either case, the evaluation succeeds.
+ *
+ * StarExpr - The child expression is repeatedly evaluated against the input until no more
matches
+ * occur. The evaluation always suceeds. The result is a list of 1 or more nodes.
+ *
+ * PlusExpr - The child expression is repeatedly evaluated against the input until no more
matches
+ * occur. The evaluation always suceeds if at least one match was made. The result is a list
of
+ * 1 or more nodes.
+ *
+ * IdentExpr - Reference to another rule defined in the grammar. Always has a value, representing
+ * the name of the rule, and when the grammar resolution algorithm (not yet implemented)
is run,
+ * will have a single child pointing to the expression associated with the referenced rule.
+ *
+ * LitExpr - Literal match against a sequence of characters. Always has a value, representing
+ * the string to be matched, and has no child nodes.
+ *
+ * Class - Character class expression. Matches against any characters given in the children,
+ * all of which are range expressions.
+ *
+ * RangeExpr - Matches again a character that is between a specified minimum and maximum.
Evaluation
+ * succeeds if the next input character c is such that start <= c < end.
+ */
+
+typedef enum {
+    ChoiceExpr,
+    SequenceExpr,
+    AndExpr,
+    NotExpr,
+    OptExpr,
+    StarExpr,
+    PlusExpr,
+    IdentExpr,
+    LitExpr,
+    ClassExpr,
+    DotExpr,
+    RangeExpr,
+} ExprKind;
+
+typedef struct Expression Expression;
+
+Expression *ExpressionNew(ExprKind kind, int count, Expression **children);
+Expression *ExpressionNewValue(ExprKind kind, const char *value);
+Expression *ExpressionNewRange(int lo, int hi);
+void ExpressionFree(Expression *expr);
+
+ExprKind ExpressionKind(Expression *expr);
+Expression *ExpressionChild(Expression *expr, int index);
+int ExpressionCount(Expression *expr);
+const char *ExpressionValue(Expression *expr);
+int ExpressionStart(Expression *expr);
+int ExpressionEnd(Expression *expr);
+void ExpressionPrint(Expression *expr, int highestPrecedence, const char *indent);

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/e0b65643/experiments/flat/src/Grammar.c
----------------------------------------------------------------------
diff --git a/experiments/flat/src/Grammar.c b/experiments/flat/src/Grammar.c
new file mode 100644
index 0000000..2187aed
--- /dev/null
+++ b/experiments/flat/src/Grammar.c
@@ -0,0 +1,100 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "Grammar.h"
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+#include <assert.h>
+
+typedef struct Rule Rule;
+
+struct Rule {
+    char *name;
+    Expression *expr;
+    Rule *next;
+};
+
+struct Grammar {
+    Rule **nextDef;
+    Rule *defList;
+};
+
+Rule *RuleNew(const char *name, Expression *expr)
+{
+    Rule *def = (Rule *)calloc(1,sizeof(Rule));
+    def->name = strdup(name);
+    def->expr = expr;
+    return def;
+}
+
+void RuleFree(Rule *def)
+{
+    free(def->name);
+    ExpressionFree(def->expr);
+    free(def);
+}
+
+Grammar *GrammarNew(void)
+{
+    Grammar *gram = (Grammar *)calloc(1,sizeof(Grammar));
+    gram->nextDef = &gram->defList;
+    return gram;
+}
+
+void GrammarFree(Grammar *gram)
+{
+    Rule *next;
+    for (Rule *def = gram->defList; def != NULL; def = next) {
+        next = def->next;
+        RuleFree(def);
+    }
+    free(gram);
+}
+
+void GrammarDefine(Grammar *gram, const char *name, Expression *expr)
+{
+    Rule *def = RuleNew(name,expr);
+    *gram->nextDef = def;
+    gram->nextDef = &def->next;
+}
+
+void GrammarPrint(Grammar *gram)
+{
+    int maxNameLen = 0;
+    for (Rule *def = gram->defList; def != NULL; def = def->next) {
+        int nameLen = strlen(def->name);
+        if (maxNameLen < nameLen)
+            maxNameLen = nameLen;
+    }
+
+    char *prefix = malloc(maxNameLen+3);
+    memset(prefix,' ',maxNameLen+2);
+    prefix[maxNameLen+2] = '\0';
+
+    for (Rule *def = gram->defList; def != NULL; def = def->next) {
+        int nameLen = strlen(def->name);
+        printf("%s",def->name);
+        for (int i = nameLen; i < maxNameLen+1; i++)
+            printf(" ");
+        printf("<- ");
+        ExpressionPrint(def->expr,0,prefix);
+        printf("\n");
+    }
+
+    free(prefix);
+}

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/e0b65643/experiments/flat/src/Grammar.h
----------------------------------------------------------------------
diff --git a/experiments/flat/src/Grammar.h b/experiments/flat/src/Grammar.h
new file mode 100644
index 0000000..d289ee6
--- /dev/null
+++ b/experiments/flat/src/Grammar.h
@@ -0,0 +1,28 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+#include <sys/types.h>
+
+#include "Expression.h"
+
+typedef struct Grammar Grammar;
+
+Grammar *GrammarNew(void);
+void GrammarFree(Grammar *gram);
+void GrammarDefine(Grammar *gram, const char *name, Expression *expr);
+void GrammarPrint(Grammar *gram);

http://git-wip-us.apache.org/repos/asf/incubator-corinthia/blob/e0b65643/experiments/flat/src/flat.c
----------------------------------------------------------------------
diff --git a/experiments/flat/src/flat.c b/experiments/flat/src/flat.c
new file mode 100644
index 0000000..e34172c
--- /dev/null
+++ b/experiments/flat/src/flat.c
@@ -0,0 +1,28 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <stdio.h>
+#include "Builtin.h"
+
+int main(int argc, const char **argv)
+{
+    // Build and print out the built-in PEG grammar
+    Grammar *gram = GrammarNewBuiltin();
+    GrammarPrint(gram);
+    GrammarFree(gram);
+    return 0;
+}


Mime
View raw message