corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jan i <j...@apache.org>
Subject Re: My plans for January
Date Sat, 03 Jan 2015 18:40:44 GMT
On 3 January 2015 at 13:02, Peter Kelly <pmkelly@apache.org> wrote:

> Inspired by Jan’s excellent idea of posting what we each plan to work on,
> I thought I’d chip in with my intentions:
>
> - Complete development of a generic parser library based on Parsing
> Expression Grammars [1,2], which will serve as a basis for parsing non-XML
> based file formats like Markdown, AsciiDoc, reStructuredText, and RTF. This
> is something I’ve been dabbling with on and off for about a year now, and
> have recently done a complete rewrite of. I also forsee potential in
> extending this into a high-level programming language for expressing
> transformations similar to XSLT or Stratego/XT [3], but that’s something
> for a little further down the track.
>
I like the idea especially after having read up on  Stratego/XT. However we
still need at some point to discuss how we store information internally,
and how filters can access this information.


>
> I’ll put this code in a separate, experimental branch once it’s in a
> vaguely reasonable state - Real Soon Now (TM).
>
> - Implement parsers for XML and HTML. Theoretically this could be done
> with the PEG-based parser above, but will be quicker and easier to do
> “manually”, as neither are very complicated to do. This will allow us to
> remove the external dependencies on libxml2, iconv, and htmltidy. I’ll
> likely actually do this first, given that it’s the easiest.
>
+1 I would really see those go away.

>
> Note that given these dependencies will shortly be going away, I recommend
> against trying to isolate them in platform, as doing so will likely be more
> effort than writing the parsers themselves due to the dependencies on data
> structures used in core (specifically the DOM classes), which aren’t
> accessible from platform.
>
Agreed, not in my current plans anyhow.


>
> - Document more of the code base. This will include coding conventions -
> how things like error handling, memory management, and string
> representation/manipulation are carried out by the library. It will also
> cover the core classes and parts of the existing Word filter.
>
Coding conventions would be real nice to have as a policy web page. I am
working with dorte on a couple of extensions to our web, so if you can make
the raw text, then dorte can change drawings etc. into the responsive
design.


>
> For those of you interested in formal language theory and parsing
> techniques, I recommend reading [4] which describes some of the history and
> recent developments such as packrat parsing which make for practical and
> simpler implementations of parsers for a more general range of languages
> than handled by LL/LR grammars of old. Flex and Bison users in particular
> should find this a relieving read :)
>
> [1] Bryan Ford: Parsing expression grammars: a recognition-based syntactic
> foundation. POPL 2004: 111-122. http://bford.info/pub/lang/peg.pdf
>
> [2] Bryan Ford: Packrat parsing: : simple, powerful, lazy, linear time,
> functional pearl. ICFP 2002: 36-47.
> http://bford.info/pub/lang/packrat-icfp02.pdf
>
> [3] http://strategoxt.org
>
> [4] Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth: Pure and
> declarative syntax definition: paradise lost and regained. OOPSLA 2010:
> 918-932.
> http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-SERG-2010-019.pdf
>

rgds
jan i.


>
> —
> Dr Peter M. Kelly
> pmkelly@apache.org
>
> PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message