corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kelly <pmke...@apache.org>
Subject My plans for January
Date Sat, 03 Jan 2015 12:02:56 GMT
Inspired by Jan’s excellent idea of posting what we each plan to work on, I thought I’d
chip in with my intentions:

- Complete development of a generic parser library based on Parsing Expression Grammars [1,2],
which will serve as a basis for parsing non-XML based file formats like Markdown, AsciiDoc,
reStructuredText, and RTF. This is something I’ve been dabbling with on and off for about
a year now, and have recently done a complete rewrite of. I also forsee potential in extending
this into a high-level programming language for expressing transformations similar to XSLT
or Stratego/XT [3], but that’s something for a little further down the track.

I’ll put this code in a separate, experimental branch once it’s in a vaguely reasonable
state - Real Soon Now (TM).

- Implement parsers for XML and HTML. Theoretically this could be done with the PEG-based
parser above, but will be quicker and easier to do “manually”, as neither are very complicated
to do. This will allow us to remove the external dependencies on libxml2, iconv, and htmltidy.
I’ll likely actually do this first, given that it’s the easiest.

Note that given these dependencies will shortly be going away, I recommend against trying
to isolate them in platform, as doing so will likely be more effort than writing the parsers
themselves due to the dependencies on data structures used in core (specifically the DOM classes),
which aren’t accessible from platform.

- Document more of the code base. This will include coding conventions - how things like error
handling, memory management, and string representation/manipulation are carried out by the
library. It will also cover the core classes and parts of the existing Word filter.

For those of you interested in formal language theory and parsing techniques, I recommend
reading [4] which describes some of the history and recent developments such as packrat parsing
which make for practical and simpler implementations of parsers for a more general range of
languages than handled by LL/LR grammars of old. Flex and Bison users in particular should
find this a relieving read :)

[1] Bryan Ford: Parsing expression grammars: a recognition-based syntactic foundation. POPL
2004: 111-122. http://bford.info/pub/lang/peg.pdf

[2] Bryan Ford: Packrat parsing: : simple, powerful, lazy, linear time, functional pearl.
ICFP 2002: 36-47. http://bford.info/pub/lang/packrat-icfp02.pdf

[3] http://strategoxt.org

[4] Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth: Pure and declarative syntax definition:
paradise lost and regained. OOPSLA 2010: 918-932. http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-SERG-2010-019.pdf

—
Dr Peter M. Kelly
pmkelly@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message