lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane James Vaucher <vauch...@cirano.qc.ca>
Subject RE: Understanding Boolean Queries
Date Thu, 29 Apr 2004 17:40:18 GMT
Hi Tate,

Forgot to ask, what version of Lucene? (IIRC, <= 1.2, means no
maxClauseCount)

sv

On Thu, 29 Apr 2004, Tate Avery wrote:

> Thank you for the response.
>
> I am not using the QueryParser directly... it was just part of my overall
> understanding of how this exception is coming about.  Same thing,
> essentially, with the maxClauseCount.
>
>
> Here is some code to illustrate what is confusing me and what I am trying to
> ascertain:
>
> 	int _numClauses = XXX;
> 	boolean _required = XXX;  // 3 examples of these var settings below
>
> 	BooleanQuery _query = new BooleanQuery();
>
> 	for (int _i = 0; _i < _numClauses; _i++)
> 	{
> 		_query.add(
> 			new BooleanClause(
> 				new TermQuery(new Term("body", "term" + _i)),
> 				_required,
> 				false));
> 	}
>
> 	Hits _hits = new IndexSearcher(INDEX_DIR).search(_query);
>
>
> 1) With _numClauses=9999 and _required=false (for example), I have no
> problems.
> (This is confusing since 9999 is more than maxClauseCount... but I won't
> complain).
>
> 2) With _numClauses=32 and _required=true, I also have no problems.
>
> 3) With _numClauses=33 and _required=true, I get
> "java.lang.IndexOutOfBoundsException: More than 32 required/prohibited
> clauses in query." as a runtime exception.
>
>
> So, I guess I am trying to ask the following:
>
> Is a query like (T1 AND T2 AND ... AND T32 AND T33) just completely illegal
> for Lucene?
> OR is there some way to extend this limit?
> OR am I missing something that is clouding my understanding?
>
>
>
> Thanks,
> Tate
>
>
>
> -----Original Message-----
> From: Stephane James Vaucher [mailto:vauchers@cirano.qc.ca]
> Sent: Thursday, April 29, 2004 1:10 PM
> To: Lucene Users List; tate.avery@nstein.com
> Cc: lucene-dev@jakarta.apache.org
> Subject: Re: Understanding Boolean Queries
>
>
> On Thu, 29 Apr 2004, Tate Avery wrote:
>
> > Hello,
> >
> > I have been reviewing some of the code related to boolean queries and I
> > wanted to see if my understanding is approximately correct regarding how
> > they are handled and, more importantly, the limitations.
>
> You can always submit requests for enhancements in bugzilla, so as to keep
> track this issue.
>
> > Here is what I have come to understand so far:
> >
> > 1) The QueryParser code generated from javacc will parse my boolean query
> > and determine for each clause whether or not is 'required' (based on a few
> > conditions, but, in short, whether or not it was introduced or followed by
> > 'AND') or 'prohibited' (based, in short, on it being preceded by 'NOT').
>
> Your usage seems pretty particular, why are you using the javacc
> QueryParser?
>
> > 2) As my BooleanQuery is being constructed, it will throw a
> > BooleanQuery.TooManyClauses exception if I exceed
> > BooleanQuery.maxClauseCount (which defaults to 1024).
>
> It's configurable through sys properties or by
> BooleanQuery.setMaxClauseCount(int maxClauseCount)
> >
> > 3) The maxClauseCount threshold appears not to care whether or not my
> > clauses are 'required' or 'prohibited'... only how many of them there are
> in
> > total.
> >
> > 4) My BooleanQuery will prepare its own Scorer instance (i.e.
> > BooleanScorer).  And, during this step, it will identify to the scorer
> which
> > clauses are 'required' or 'prohibited'.  And, if more than 32 fall into
> this
> > category, a IndexOutOfBoundsException ("More than 32 required/prohibited
> > clauses in query.") is thrown.
> > That's as far as I got.
> > Now, I am a bit confused at this point.  Does this mean I can make a
> boolean
> > query consisting of up to 1024 clauses as long as no more than 32 of them
> > are required or prohibited?  This doesn't seem right.  So, am I missing
> > something in the way I am understanding this.
> > I am (as you may have guessed) generating large boolean queries.  And, in
> > some rare cases, I am receiving the exception identified in #4 (above).
> So,
> > I am trying to figure out whether or not I need to change/filter my
> queries
> > in a special way in order to avoid this exception.  And, in order to do
> > this, I want to understand how these queries are being handled.
> > Finally, is there something related to the query syntax that could be my
> > mistake?  For example, what is the difference between:
> > 	"A B" AND "C D" AND "D E"
> > ... and...
> > 	("A B") AND ("C D") AND ("D E")
> > ... could that be the crux of it?
>
> I can't help you here, and the doc seems rather thin (or nonexistent for
> this class). I don't know the relation between the query and how the
> scorer will process it.
>
> Sorry I can't be of assistance,
> sv
>
> > Thank you for your time,
> > Tate Avery
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message