Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 57640 invoked from network); 29 Apr 2004 17:36:08 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 29 Apr 2004 17:36:08 -0000 Received: (qmail 72417 invoked by uid 500); 29 Apr 2004 17:35:48 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 72398 invoked by uid 500); 29 Apr 2004 17:35:48 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 72369 invoked from network); 29 Apr 2004 17:35:48 -0000 Received: from unknown (HELO avas6.telusquebec.local) (142.169.1.151) by daedalus.apache.org with SMTP; 29 Apr 2004 17:35:48 -0000 Received: from nodnsquery(192.168.250.8) by avas6.telusquebec.local via csmap id f5c7b588_9a03_11d8_8d96_0002b3e6f1b0_10144; Thu, 29 Apr 2004 13:38:01 -0400 (EDT) Received: from avas6.telusquebec.local (foo.nstein.com [206.162.161.114]) by smtp1.globetrotter.net (iPlanet Messaging Server 5.2) with ESMTPA id <0HWY00J1Z0VQGN@"TELUS Quebec">; Thu, 29 Apr 2004 13:35:51 -0400 (EDT) Received: from foo.nstein.com(206.162.161.114) by avas6.telusquebec.local via csmap id f51bc5ca_9a03_11d8_9fe0_0002b3e6f1b0_10134; Thu, 29 Apr 2004 13:37:59 -0400 (EDT) Date: Thu, 29 Apr 2004 13:30:10 -0400 From: Tate Avery Subject: RE: Understanding Boolean Queries In-reply-to: To: 'Lucene Users List' Cc: lucene-dev@jakarta.apache.org Reply-to: tate.avery@nstein.com Message-id: <001d01c42e0f$a0120860$4c0012ac@nstein.com> MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 X-Mailer: Microsoft Outlook CWS, Build 9.0.6604 (9.0.2911.0) Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Importance: Normal X-Priority: 3 (Normal) X-MSMail-priority: Normal X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Thank you for the response. I am not using the QueryParser directly... it was just part of my overall understanding of how this exception is coming about. Same thing, essentially, with the maxClauseCount. Here is some code to illustrate what is confusing me and what I am trying to ascertain: int _numClauses = XXX; boolean _required = XXX; // 3 examples of these var settings below BooleanQuery _query = new BooleanQuery(); for (int _i = 0; _i < _numClauses; _i++) { _query.add( new BooleanClause( new TermQuery(new Term("body", "term" + _i)), _required, false)); } Hits _hits = new IndexSearcher(INDEX_DIR).search(_query); 1) With _numClauses=9999 and _required=false (for example), I have no problems. (This is confusing since 9999 is more than maxClauseCount... but I won't complain). 2) With _numClauses=32 and _required=true, I also have no problems. 3) With _numClauses=33 and _required=true, I get "java.lang.IndexOutOfBoundsException: More than 32 required/prohibited clauses in query." as a runtime exception. So, I guess I am trying to ask the following: Is a query like (T1 AND T2 AND ... AND T32 AND T33) just completely illegal for Lucene? OR is there some way to extend this limit? OR am I missing something that is clouding my understanding? Thanks, Tate -----Original Message----- From: Stephane James Vaucher [mailto:vauchers@cirano.qc.ca] Sent: Thursday, April 29, 2004 1:10 PM To: Lucene Users List; tate.avery@nstein.com Cc: lucene-dev@jakarta.apache.org Subject: Re: Understanding Boolean Queries On Thu, 29 Apr 2004, Tate Avery wrote: > Hello, > > I have been reviewing some of the code related to boolean queries and I > wanted to see if my understanding is approximately correct regarding how > they are handled and, more importantly, the limitations. You can always submit requests for enhancements in bugzilla, so as to keep track this issue. > Here is what I have come to understand so far: > > 1) The QueryParser code generated from javacc will parse my boolean query > and determine for each clause whether or not is 'required' (based on a few > conditions, but, in short, whether or not it was introduced or followed by > 'AND') or 'prohibited' (based, in short, on it being preceded by 'NOT'). Your usage seems pretty particular, why are you using the javacc QueryParser? > 2) As my BooleanQuery is being constructed, it will throw a > BooleanQuery.TooManyClauses exception if I exceed > BooleanQuery.maxClauseCount (which defaults to 1024). It's configurable through sys properties or by BooleanQuery.setMaxClauseCount(int maxClauseCount) > > 3) The maxClauseCount threshold appears not to care whether or not my > clauses are 'required' or 'prohibited'... only how many of them there are in > total. > > 4) My BooleanQuery will prepare its own Scorer instance (i.e. > BooleanScorer). And, during this step, it will identify to the scorer which > clauses are 'required' or 'prohibited'. And, if more than 32 fall into this > category, a IndexOutOfBoundsException ("More than 32 required/prohibited > clauses in query.") is thrown. > That's as far as I got. > Now, I am a bit confused at this point. Does this mean I can make a boolean > query consisting of up to 1024 clauses as long as no more than 32 of them > are required or prohibited? This doesn't seem right. So, am I missing > something in the way I am understanding this. > I am (as you may have guessed) generating large boolean queries. And, in > some rare cases, I am receiving the exception identified in #4 (above). So, > I am trying to figure out whether or not I need to change/filter my queries > in a special way in order to avoid this exception. And, in order to do > this, I want to understand how these queries are being handled. > Finally, is there something related to the query syntax that could be my > mistake? For example, what is the difference between: > "A B" AND "C D" AND "D E" > ... and... > ("A B") AND ("C D") AND ("D E") > ... could that be the crux of it? I can't help you here, and the doc seems rather thin (or nonexistent for this class). I don't know the relation between the query and how the scorer will process it. Sorry I can't be of assistance, sv > Thank you for your time, > Tate Avery > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org