lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: QueryParser Is Badly Broken
Date Fri, 13 Oct 2006 00:10:49 GMT

On Oct 12, 2006, at 7:11 PM, Renaud Waldura wrote:
> I'm developing an application used by scientists -- people who have  
> a pretty good idea of what logic is -- and they were shocked to  
> find out that neither of these queries return the same results:
>
> 1- banana AND apple OR orange
> 2- banana AND (apple OR orange)
> 3- (banana AND apple) OR orange
>
> I'd expect (1) to be either (2) or (3), but it turns out it's  
> parsed as "+banana apple orange". I was rather, uh, dismayed by  
> this find, as it doesn't seem to make sense.

It's not news to the die hard Luceners that QueryParser is mangled.   
It's a kitchen sink syntax with more bells and whistles than most  
applications need.  I've yet to come across  a project that has used  
QueryParser as-is, not because it's "broken", but because every  
application has been unique in how queries are expressed by users.

>    a- queries which mix boolean operators require strict  
> parenthesizing to work right
>
>    b- "+" isn't shorthand for "AND"; using it with "AND"/"OR"/"NOT"  
> and the default operator "" rarely does what you expect

AND/OR are oddly named in terms of how they map to the underlying  
BooleanQuery they create.  AND really means to make both clauses  
MUST, and OR means to make them SHOULD.  And, as you've painfully  
experienced, the precedence is not "logical".

>    c- the stock QueryParser doesn't work well in these cases
>
>    d- there's a new PrecedenceQueryParser at http://svn.apache.org/ 
> repos/asf/lucene/java/trunk/contrib/miscellaneous that solves  
> *some* of the issues but creates others

What issues does PQP create?  Perhaps we can get those fixed and  
replace QueryParser with it.

> While we are also developing a query-building UI, users must be  
> able to enter text queries as well. What do other folks do? I mean,  
> this is pretty bad. I can hardly go back to my scientists and tell  
> them Lucene is unable to handle 2 boolean operators, that they  
> should parenthesize everything by hand. I mean, that's just cheesy.

It really boils down to user interface, from my perspective.  Do the  
users need to type in all of that kind of logic?  Or could they be  
presented with a simpler syntax with just +/- in front of terms to  
indicate MUST/NOT (and SHOULD with no prefix)?  Perhaps they could be  
presented with two text boxes, one for required terms, and another  
for optional terms (and maybe another for prohibited terms)?

We are all certainly very open to improving QueryParser, or PQP.

	Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message