lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tate Avery <tate.av...@nstein.com>
Subject Understanding Boolean Queries
Date Thu, 29 Apr 2004 16:12:19 GMT
Hello,

I have been reviewing some of the code related to boolean queries and I
wanted to see if my understanding is approximately correct regarding how
they are handled and, more importantly, the limitations.


Here is what I have come to understand so far:

1) The QueryParser code generated from javacc will parse my boolean query
and determine for each clause whether or not is 'required' (based on a few
conditions, but, in short, whether or not it was introduced or followed by
'AND') or 'prohibited' (based, in short, on it being preceded by 'NOT').

2) As my BooleanQuery is being constructed, it will throw a
BooleanQuery.TooManyClauses exception if I exceed
BooleanQuery.maxClauseCount (which defaults to 1024).

3) The maxClauseCount threshold appears not to care whether or not my
clauses are 'required' or 'prohibited'... only how many of them there are in
total.

4) My BooleanQuery will prepare its own Scorer instance (i.e.
BooleanScorer).  And, during this step, it will identify to the scorer which
clauses are 'required' or 'prohibited'.  And, if more than 32 fall into this
category, a IndexOutOfBoundsException ("More than 32 required/prohibited
clauses in query.") is thrown.
That's as far as I got.
Now, I am a bit confused at this point.  Does this mean I can make a boolean
query consisting of up to 1024 clauses as long as no more than 32 of them
are required or prohibited?  This doesn't seem right.  So, am I missing
something in the way I am understanding this.
I am (as you may have guessed) generating large boolean queries.  And, in
some rare cases, I am receiving the exception identified in #4 (above).  So,
I am trying to figure out whether or not I need to change/filter my queries
in a special way in order to avoid this exception.  And, in order to do
this, I want to understand how these queries are being handled.
Finally, is there something related to the query syntax that could be my
mistake?  For example, what is the difference between:
	"A B" AND "C D" AND "D E"
... and...
	("A B") AND ("C D") AND ("D E")
... could that be the crux of it?

Thank you for your time,
Tate Avery


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message