Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 26714 invoked from network); 5 Aug 2006 02:40:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 5 Aug 2006 02:40:52 -0000 Received: (qmail 68212 invoked by uid 500); 5 Aug 2006 02:40:46 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 68182 invoked by uid 500); 5 Aug 2006 02:40:46 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 68171 invoked by uid 99); 5 Aug 2006 02:40:46 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Aug 2006 19:40:46 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of DORONC@il.ibm.com designates 195.212.29.155 as permitted sender) Received: from [195.212.29.155] (HELO mtagate6.de.ibm.com) (195.212.29.155) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Aug 2006 19:40:44 -0700 Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate6.de.ibm.com (8.13.7/8.13.7) with ESMTP id k752eMxa131018 for ; Sat, 5 Aug 2006 02:40:22 GMT Received: from d12av02.megacenter.de.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12nrmr1607.megacenter.de.ibm.com (8.13.6/8.13.6/NCO v8.1) with ESMTP id k752htHD150986 for ; Sat, 5 Aug 2006 04:43:55 +0200 Received: from d12av02.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av02.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id k752eM5q002869 for ; Sat, 5 Aug 2006 04:40:22 +0200 Received: from d12mc102.megacenter.de.ibm.com (d12mc102.megacenter.de.ibm.com [9.149.167.114]) by d12av02.megacenter.de.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id k752eMwd002864 for ; Sat, 5 Aug 2006 04:40:22 +0200 In-Reply-To: <359a92830608021117i184e4807mbdc187414e8dfc4e@mail.gmail.com> Subject: Re: wildcards and spans To: java-user@lucene.apache.org X-Mailer: Lotus Notes Release 7.0 HF144 February 01, 2006 Message-ID: From: Doron Cohen Date: Fri, 4 Aug 2006 19:33:13 -0700 X-MIMETrack: Serialize by Router on D12MC102/12/M/IBM(Release 7.0.1HF269 | June 22, 2006) at 05/08/2006 05:43:55 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N A thought - would you (or the project lead;-) consider limiting the 'wildcard expansion'? Assuming a query like: ( uni* near(5) science ) I.e. match docs with any word with prefix "uni" that spans no further than 5 from the word "science". Assume current lexicon has M (say 1200) words starting with "uni", you could (manually) select the first N (say 50) words from this list. You could also select the 'best' N words - defining 'best' in correlation with those words DF for instance. This has a few issues: - what N value is good enough? - how to select the N words? - you would have to interfere with the code that expands the wildcard. - search results recall may degrade because it might happen that words selected as 'best' would not pass the 'span test' while some of the words that were not selected would have passed the span test. But it might be practical, and perhaps, hopefully, satisfactory. Regards, Doron "Erick Erickson" wrote on 02/08/2006 11:17:19: > I'm almost entirely certain that any value I choose for setMaxClauseCount is > going to be wrong, but I might give it a try. > > Erick > > On 8/2/06, Paul Elschot wrote: > > > > On Wednesday 02 August 2006 17:29, Erick Erickson wrote: > > > I'm back, with another flavor of wildcards. What direction would you > > point a > > > poor boy who's project lead wants wildcard queries and spans? Here's the > > > problem.... > > > > > > I cannot use any of the classes that throw a "TooManyClauses" exception > > (e.g. > > > SpanRegexQuery or SpanNearQuery with, say WildCardQuery). The corpus is > > big > > > enough that this is guaranteed to be thrown. So, currently I'm using a > > > filter for wildcard queries, populating it via WildcardTermEnum and > > > TermDocs... Works like a champ. But I don't see how to combine this with > > > spans... > > > > You can try BooleanQuery.setMaxClauseCount() to increase the max. nr. of > > clauses to 100000 or so and see what happens when searching. > > With enough RAM it should work nicely. > > > > You could also use the surround query language. This allows to set > > the max. nr. of clauses for a whole query instead of per BooleanQuery. > > > > Regards, > > Paul Elschot > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org