lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tate Avery <tate.av...@nstein.com>
Subject RE: Understanding Boolean Queries
Date Thu, 29 Apr 2004 17:42:07 GMT

Sorry, I retract this statement...

>> 1) With _numClauses=9999 and _required=false (for example), I have no
problems.
>> (This is confusing since 9999 is more than maxClauseCount... but I won't
complain).

My little test app was using lucene-1.3-rc1.jar.  But, my REAL app is using
lucene-1.3-final.jar.

So, with _numClauses=1024 and _required=false, I have no problems.
And, with _numClauses=1025 and _required=false, I get the TooManyClauses
exception.

All of that is fine and good.  My main concern is the 32/33 threshold when
_required=true (see details below).


Tate



-----Original Message-----
From: Tate Avery [mailto:tate.avery@nstein.com]
Sent: Thursday, April 29, 2004 1:30 PM
To: 'Lucene Users List'
Cc: lucene-dev@jakarta.apache.org
Subject: RE: Understanding Boolean Queries


Thank you for the response.

I am not using the QueryParser directly... it was just part of my overall
understanding of how this exception is coming about.  Same thing,
essentially, with the maxClauseCount.


Here is some code to illustrate what is confusing me and what I am trying to
ascertain:

	int _numClauses = XXX;
	boolean _required = XXX;  // 3 examples of these var settings below

	BooleanQuery _query = new BooleanQuery();

	for (int _i = 0; _i < _numClauses; _i++)
	{
		_query.add(
			new BooleanClause(
				new TermQuery(new Term("body", "term" + _i)),
				_required,
				false));
	}

	Hits _hits = new IndexSearcher(INDEX_DIR).search(_query);


1) With _numClauses=9999 and _required=false (for example), I have no
problems.
(This is confusing since 9999 is more than maxClauseCount... but I won't
complain).

2) With _numClauses=32 and _required=true, I also have no problems.

3) With _numClauses=33 and _required=true, I get
"java.lang.IndexOutOfBoundsException: More than 32 required/prohibited
clauses in query." as a runtime exception.


So, I guess I am trying to ask the following:

Is a query like (T1 AND T2 AND ... AND T32 AND T33) just completely illegal
for Lucene?
OR is there some way to extend this limit?
OR am I missing something that is clouding my understanding?



Thanks,
Tate



-----Original Message-----
From: Stephane James Vaucher [mailto:vauchers@cirano.qc.ca]
Sent: Thursday, April 29, 2004 1:10 PM
To: Lucene Users List; tate.avery@nstein.com
Cc: lucene-dev@jakarta.apache.org
Subject: Re: Understanding Boolean Queries


On Thu, 29 Apr 2004, Tate Avery wrote:

> Hello,
>
> I have been reviewing some of the code related to boolean queries and I
> wanted to see if my understanding is approximately correct regarding how
> they are handled and, more importantly, the limitations.

You can always submit requests for enhancements in bugzilla, so as to keep
track this issue.

> Here is what I have come to understand so far:
>
> 1) The QueryParser code generated from javacc will parse my boolean query
> and determine for each clause whether or not is 'required' (based on a few
> conditions, but, in short, whether or not it was introduced or followed by
> 'AND') or 'prohibited' (based, in short, on it being preceded by 'NOT').

Your usage seems pretty particular, why are you using the javacc
QueryParser?

> 2) As my BooleanQuery is being constructed, it will throw a
> BooleanQuery.TooManyClauses exception if I exceed
> BooleanQuery.maxClauseCount (which defaults to 1024).

It's configurable through sys properties or by
BooleanQuery.setMaxClauseCount(int maxClauseCount)
>
> 3) The maxClauseCount threshold appears not to care whether or not my
> clauses are 'required' or 'prohibited'... only how many of them there are
in
> total.
>
> 4) My BooleanQuery will prepare its own Scorer instance (i.e.
> BooleanScorer).  And, during this step, it will identify to the scorer
which
> clauses are 'required' or 'prohibited'.  And, if more than 32 fall into
this
> category, a IndexOutOfBoundsException ("More than 32 required/prohibited
> clauses in query.") is thrown.
> That's as far as I got.
> Now, I am a bit confused at this point.  Does this mean I can make a
boolean
> query consisting of up to 1024 clauses as long as no more than 32 of them
> are required or prohibited?  This doesn't seem right.  So, am I missing
> something in the way I am understanding this.
> I am (as you may have guessed) generating large boolean queries.  And, in
> some rare cases, I am receiving the exception identified in #4 (above).
So,
> I am trying to figure out whether or not I need to change/filter my
queries
> in a special way in order to avoid this exception.  And, in order to do
> this, I want to understand how these queries are being handled.
> Finally, is there something related to the query syntax that could be my
> mistake?  For example, what is the difference between:
> 	"A B" AND "C D" AND "D E"
> ... and...
> 	("A B") AND ("C D") AND ("D E")
> ... could that be the crux of it?

I can't help you here, and the doc seems rather thin (or nonexistent for
this class). I don't know the relation between the query and how the
scorer will process it.

Sorry I can't be of assistance,
sv

> Thank you for your time,
> Tate Avery
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message