jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller" <thomas.tom.muel...@gmail.com>
Subject Re: master plan for jsr 283 query implementation
Date Wed, 12 Sep 2007 08:24:21 GMT

Two more advantages of a hand-written parser:

- You can actually debug the parser. No chance with JavaCC or ANTLR
- Better tools support (refactoring, autocomplete)

> sorry for my somewhat ironic statement about you being the only one
> wanting a hand-written parser,

To my surprise, it turns out I was wrong!

> Just curious, don't you use use a separate tokenizing step in your
> hand-written parsers (I'm asking because of the literal "AND" above)?

Lexing (tokenizing, scanning) is done in a lower level. Can be
hand-written, or using a tool (for example StringTokenizer, or JFlex).
The boundary between tokenizing, lexing and parsing is soft. In my
example tokenizing is done in 'read(): a token'.

> I usually prefer a separate tokenizing step, if only to make testing
> easier.

Sure! Not sure how to do that in JavaCC or ANTLR, but it is probably
possible as well.

> context-sensitive tokenizing

I'm not sure what you refer to. Keywords versus identifiers? Example
token types are: 'integer value', 'decimal value', 'text value',
'operator', 'quoted identifier', 'name'. The keywords are well defined
in Java, but for SQL, I wouldn't decide if it's a keyword or
identifier while tokenizing. Remarks are usually silently eaten by the
tokenizer (except for @deprecated in Javac).

> The final answer to this question is probably "whoever implements it
> gets to decide". For me, the easiest way to understand a parser would
> be the unit tests which demonstrate its functionality, anyway.

I fully agree.

Some example parser code:

Derby JavaCC source file (313 KB):
(the generated .java files are 691 + 314 + 20 + 5 = 1030 KB)

H2 hand-written parser (161 KB):


View raw message