Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 62686 invoked from network); 30 Nov 2007 20:12:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Nov 2007 20:12:10 -0000 Received: (qmail 4540 invoked by uid 500); 30 Nov 2007 20:11:52 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 4486 invoked by uid 500); 30 Nov 2007 20:11:52 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 4471 invoked by uid 99); 30 Nov 2007 20:11:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Nov 2007 12:11:52 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.134.184 as permitted sender) Received: from [209.85.134.184] (HELO mu-out-0910.google.com) (209.85.134.184) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Nov 2007 20:11:30 +0000 Received: by mu-out-0910.google.com with SMTP id i10so45839mue for ; Fri, 30 Nov 2007 12:11:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=/JGfdU2i0rGdGn95eRhlXp9/U/IZ5LtS+2Rd/KvB/d0=; b=Ac5GF3tebRbcYreIBt4IKTSzeQazvAi6g3i0fwpyyannwJO/7tuhPXqU9chklSz6VmwSwMbWNqf0CLTS94nBi2lVeV7IbCac//kppmgG/LKHvG8Z5KGhoTOH6ngrXBeev6YCFMbIGWvMGRVz5s4+UMZ6D+wE4IvnwYCzmNjou98= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=U+6DPwDXZ+bxyNibONt29DZWtnNRBMjE5ZZJ1hxWK94wBul/fV9O+dUMd102SfdkFC4tEjBLeYZ2GLMeVY6Uj8wTTuqOdPvfhUiCWndcUKXOJfJROK4J+LE0Lf34hKjK6s9jEofCcr/ZRAO+7atNGPY8y1HclSXomK9nLjS3ujg= Received: by 10.82.180.17 with SMTP id c17mr765403buf.1196453490908; Fri, 30 Nov 2007 12:11:30 -0800 (PST) Received: by 10.82.155.11 with HTTP; Fri, 30 Nov 2007 12:11:30 -0800 (PST) Message-ID: <359a92830711301211g17a07b68k115df38dcce2822e@mail.gmail.com> Date: Fri, 30 Nov 2007 15:11:30 -0500 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: BooleanQuery TooManyClauses in wildcard search In-Reply-To: <47505F66.40506@propylon.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_675_17010704.1196453490845" References: <191027.77244.qm@web57303.mail.re1.yahoo.com> <47505F66.40506@propylon.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_675_17010704.1196453490845 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline John's answer is spot-on. There's a wealth of information in the user group archives that you should be able to search on discussing ways of providing the functionality. One thread titled "I just don't get wildcards at all" is one where the folks who know generously helped me out. Once you find out how to search for that you'll know you're in the right place. Here's the searchable archive..... http://www.gossamer-threads.com/lists/engine?do=search;search_forum=forum_2;;list=lucene Make sure you select the "java user" from the top drop-down labeled "Search". Best Erick On Nov 30, 2007 2:07 PM, John Byrne wrote: > Hi, > > Your problem is that when you do a wildacrd search, Lucene expands the > wildacrd term into all possible terms. So, searching for "stat*" > produces a list of terms like "state", "states", "stating" etc. (It only > uses terms that actually occur in your index, however). These terms are > all added as OR clauses of a boolean query. > > The thing is, be defult, there is a limit of 1024 caluses for a boolean > query. If yuor wildacrd term expands into more than this, (which happens > very easily), you get that exception you described. You can solve the > issues by setting the maximum clause count yourself, using > > BooleanQuery.setMaxClauseCount(int maxClauseCount) > > See > > http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/core/index.html > for mroe info. > > Bear in mind that putting a wildcard near the start of the term results > in a large number of boolean clauses, which increases memory usage. This > is the reason for the default limit. This limit will also affect fuzzy > queries, because they are expanded in the same way. > > Regards, > JB > > Ruchi Thakur wrote: > > > > Hi there. > > I am a new Lucene user and I have been searching the group archives but > couldn't solve the problem. I have just joined a project that uses Lucene. > > We use the StandardAnalyzer for indexing our documents and our query is > as > > follows when we issue a search string of t* for example: > > +t* +cont_type:pa > > > > We get an Exception when we issue some of our wildcard text searches > we get following Exception > > org.apache.lucene.search.BooleanQuery$TooManyClauses Exception : Max > clause if 1024 > > > > Please suggest. > > > > Regards, > > Ruchi > > > > > > > > > > > > > > > > > > --------------------------------- > > Never miss a thing. Make Yahoo your homepage. > > > > ------------------------------------------------------------------------ > > > > No virus found in this incoming message. > > Checked by AVG Free Edition. > > Version: 7.5.503 / Virus Database: 269.16.11/1161 - Release Date: > 30/11/2007 12:12 > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_675_17010704.1196453490845--