lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: QueryParser refactoring
Date Mon, 07 Mar 2005 21:36:19 GMT
Morus - thanks for your quick checks.  More below....

On Mar 7, 2005, at 3:32 PM, Morus Walter wrote:
> I had a quick look at the new QP. I didn't look at the code yet, but I
> redid my patch at the weekend for the current code, and I found it 
> quite
> ugly ;-) I didn't finish it completely, so I didn't upload it to
> bugzilla yet.
> Your changes look great in general, though I find some issues:
> 1) 'stop OR stop AND stop' where stop is a stopword gives a parse 
> error:
> Encountered "<EOF>" at line 1, column 0.
> Was expecting one of:
>     <NOT> ...
> ...

I think  you must have tried this in a transient state when I forgot to 
check in some JavaCC generated files.  Try again.  This one now returns 
an empty BooleanQuery.

> 2) Single term queries using +/- flags are parse to a query without 
> flag
> +a -> a

Hmmm.... this is a debatable one.  It's returning a TermQuery in this 
case for "a".  Is that appropriate?  Or should it return a BooleanQuery 
with a single TermQuery as required?

I think having it optimized to a TermQuery makes the most sense.  
Though, putting it in a BooleanQuery does make this next one simpler...

> -a -> a
> While this doesn't make a difference for +a it's a bit strange for -a,
> OTOH -a isn't a usable query anyway.

Oops... yeah, you're right.  If its a single clause right now it 
doesn't wrap in a BooleanQuery and thus does not take into account the 
modifier +/-/NOT.   But as you say, this is a bogus query anyway.  I 
guess the right thing to do is wrap both the +a query as above and the 
-a query into a BooleanQuery with the modifier set appropriately.

> 3) a OR NOT b parses to 'a -b' which is the same as 'a AND NOT b'
>    IMHO `a OR NOT b' should be `a OR (NOT b)' though lucene cannot 
> search
>    that. Maybe it should raise an error...

Actually it parses like this:

	a OR NOT b -> a -b
	a AND NOT b -> +a -b

So they are slightly different, though the effect will be the same.

>    a OR NOT b AND c (parsed to a -(+b +c)) should IMHO be parsed to `a 
> (-b +c)'

Ah, ok.... so NOT gets much higher precedence than I'm currently giving 
it.  That might take me a while to achieve, but I'll give it a shot.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message