lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane James Vaucher <vauch...@cirano.qc.ca>
Subject Re: Understanding Boolean Queries
Date Thu, 29 Apr 2004 17:10:01 GMT
On Thu, 29 Apr 2004, Tate Avery wrote:

> Hello,
>
> I have been reviewing some of the code related to boolean queries and I
> wanted to see if my understanding is approximately correct regarding how
> they are handled and, more importantly, the limitations.

You can always submit requests for enhancements in bugzilla, so as to keep
track this issue.

> Here is what I have come to understand so far:
>
> 1) The QueryParser code generated from javacc will parse my boolean query
> and determine for each clause whether or not is 'required' (based on a few
> conditions, but, in short, whether or not it was introduced or followed by
> 'AND') or 'prohibited' (based, in short, on it being preceded by 'NOT').

Your usage seems pretty particular, why are you using the javacc
QueryParser?

> 2) As my BooleanQuery is being constructed, it will throw a
> BooleanQuery.TooManyClauses exception if I exceed
> BooleanQuery.maxClauseCount (which defaults to 1024).

It's configurable through sys properties or by
BooleanQuery.setMaxClauseCount(int maxClauseCount)
>
> 3) The maxClauseCount threshold appears not to care whether or not my
> clauses are 'required' or 'prohibited'... only how many of them there are in
> total.
>
> 4) My BooleanQuery will prepare its own Scorer instance (i.e.
> BooleanScorer).  And, during this step, it will identify to the scorer which
> clauses are 'required' or 'prohibited'.  And, if more than 32 fall into this
> category, a IndexOutOfBoundsException ("More than 32 required/prohibited
> clauses in query.") is thrown.
> That's as far as I got.
> Now, I am a bit confused at this point.  Does this mean I can make a boolean
> query consisting of up to 1024 clauses as long as no more than 32 of them
> are required or prohibited?  This doesn't seem right.  So, am I missing
> something in the way I am understanding this.
> I am (as you may have guessed) generating large boolean queries.  And, in
> some rare cases, I am receiving the exception identified in #4 (above).  So,
> I am trying to figure out whether or not I need to change/filter my queries
> in a special way in order to avoid this exception.  And, in order to do
> this, I want to understand how these queries are being handled.
> Finally, is there something related to the query syntax that could be my
> mistake?  For example, what is the difference between:
> 	"A B" AND "C D" AND "D E"
> ... and...
> 	("A B") AND ("C D") AND ("D E")
> ... could that be the crux of it?

I can't help you here, and the doc seems rather thin (or nonexistent for
this class). I don't know the relation between the query and how the
scorer will process it.

Sorry I can't be of assistance,
sv

> Thank you for your time,
> Tate Avery
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message