Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 84754 invoked from network); 11 Nov 2004 19:57:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 11 Nov 2004 19:57:40 -0000 Received: (qmail 80757 invoked by uid 500); 11 Nov 2004 19:57:36 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 80727 invoked by uid 500); 11 Nov 2004 19:57:36 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 80712 invoked by uid 99); 11 Nov 2004 19:57:35 -0000 Received-SPF: pass (hermes.apache.org: local policy) Received: from [216.136.175.211] (HELO web21325.mail.yahoo.com) (216.136.175.211) by apache.org (qpsmtpd/0.28) with SMTP; Thu, 11 Nov 2004 11:57:35 -0800 Received: (qmail 70525 invoked by uid 60001); 11 Nov 2004 19:57:32 -0000 Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; b=49NXVKXTDGrof1318Vyp0ziWolS2Q40UJe4pBDRAOuN/43sV3O5H0J/xV6PlA/hg/gVzlMCvgDtO5ZbZwksOvLdXWFMhsy5CuOPwmJNtcwCPTldwk4KCPQ7FBJNy+LJvRfRK6hHSiIjQm83UXAQzQiG5PqtCHSRo8iEMZdmElLk= ; Message-ID: <20041111195732.70523.qmail@web21325.mail.yahoo.com> Received: from [81.182.58.251] by web21325.mail.yahoo.com via HTTP; Thu, 11 Nov 2004 11:57:32 PST Date: Thu, 11 Nov 2004 11:57:32 -0800 (PST) From: Sanyi Subject: RE: Bug in the BooleanQuery optimizer? ..TooManyClauses To: Lucene Users List In-Reply-To: <63434C14F9A6F74CB36B85033E4C30CA5BFC58@hermes.corp.cyveillance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Yes, I understand all of this, but I don't want to set it to MaxInt, since it can easily lead to (even accidental) DoS attacks. What I'm saying is that there is no reason for the optimizer to expand wild* to more than 1024 variations when I search for "somerareword AND wild*", since somerareword is only present in let's say 100 documents, so wild* should only expand to words beginning with "wild" in those 100 documents, then it should work fine with the default 1024 clause limit. But it doesn't, so I can choose between unuseable queries or accidental DoS attacks. --- Will Allen wrote: > Any wildcard search will automatically expand your query to the number of terms it find in the > index that suit the wildcard. > > For example: > > wild*, would become wild OR wilderness OR wildman etc for each of the terms that exist in your > index. > > It is because of this, that you quickly reach the 1024 limit of clauses. I automatically set it > to max int with the following line: > > BooleanQuery.setMaxClauseCount( Integer.MAX_VALUE ); > > > -----Original Message----- > From: Sanyi [mailto:need4sid@yahoo.com] > Sent: Thursday, November 11, 2004 6:46 AM > To: lucene-user@jakarta.apache.org > Subject: Bug in the BooleanQuery optimizer? ..TooManyClauses > > > Hi! > > First of all, I've read about BooleanQuery$TooManyClauses, so I know that it has a 1024 Clauses > limit by default which is good enough for me, but I still think it works strange. > > Example: > I have an index with about 20Million documents. > Let's say that there is about 3000 variants in the entire document set of this word mask: cab* > Let's say that about 500 documents are containing the word: spectrum > Now, when I search for "cab* AND spectrum", I don't expect it to throw an exception. > It should first restrict the search for the 500 documents containing the word "spectrum", then > it > should collect the variants of "cab*" withing these documents, which turns out in two or three > variants of "cab*" (cable, cables, maybe some more) and the search should return let's say 10 > documents. > > Similar example: When I search for "cab* AND nonexistingword" it still throws a TooManyClauses > exception instead of saying "No results", since there is no "nonexistingword" in my document > set, > so it doesn't even have to start collecting the variations of "cab*". > > Is there any path for this issue? > Thank you for your time! > > Sanyi > (I'm using: lucene 1.4.2) > > p.s.: Sorry for re-sending this message, I was first sending it as an accidental reply to a > wrong thread.. > > > > __________________________________ > Do you Yahoo!? > Check out the new Yahoo! Front Page. > www.yahoo.com > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > __________________________________ Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org