lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <mike.kl...@gmail.com>
Subject Re: operator precedence and confusing result
Date Fri, 13 Mar 2009 17:51:59 GMT

On 11-Mar-09, at 7:13 PM, Jenny Brown wrote:

> I use the boolean logic heavily in a production app, because it's the
> grammar that my users understand (and they put together complex
> boolean queries in other apps too).  Also, we're not using relevance
> ranking.  A document either "matches the query" and gets returned, or
> "doesn't match" and doesn't get returned.  We only want yes/no
> answers.
>
> I haven't had time to really figure out what the earlier commenter
> meant with the + operators syntax conversion.  I still thought it
> would have meant the same thing as the query I had posted, ie, article
> has to match all terms in the AND clauses, and at least one of the
> terms in the OR list.  I guess I'm still missing what his explanation
> was trying to demonstrate.
>
> Anyway, just a note to say that boolean matching is important to me
> and my users; it'd be good if it worked the way it looks like it
> would.  If it doesn't, I need to understand better what the current
> limitations are.

Well, this is precisely why I am suggesting that we remove it (in some  
future version of Lucene).  Lucene doesn't have a hierarchical boolean  
query model that works like people "expect", and bugs filed that  
report discrepancies between the way boolean operators work and  
intuition are rejected.  We are left with something that is convenient  
if you understand how it works, but if that is so, there is no reason  
that translation into the alternate syntax can't be used.

Lucene's query model is based on REQUIRED, OPTIONAL, and EXCLUDED  
clauses.  A clause with no annotation is always OPTIONAL, and doesn't  
affect matching unless there are only OPTIONAL clauses on that level.   
brackets () create a subclause (note that this is OPTIONAL by  
default!).  AND terms are translated into REQUIRED clauses, AND NOT's  
are translated into EXCLUDED clauses.  Require clauses are annotated  
with +'s

A AND B OR C OR D OR E OR F
-> +A +B C D E F
-> find documents that match clause A and clause B (other clauses  
don't affect matching)

C OR D OR E OR F
-> C D E F
-> find documents matching at least one of these clauses

A AND (B OR C OR D OR E OR F)
-> +A +(B C D E F)
-> find documents that match A, and match one of B, C, D, E, or F

(A AND B) OR C OR D OR E OR F
-> (+A +B) C D E F
-> find documents that match at least one of C, D, E, F, or both of A  
and B

The key takeaway: once you have an AND in a grouped set of clauses,  
the OR are completely irrelevant for matching.

-Mike



Mime
View raw message