Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 61353 invoked from network); 1 Sep 2006 11:59:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Sep 2006 11:59:30 -0000 Received: (qmail 49286 invoked by uid 500); 1 Sep 2006 11:59:24 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 49006 invoked by uid 500); 1 Sep 2006 11:59:23 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 48995 invoked by uid 99); 1 Sep 2006 11:59:23 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Sep 2006 04:59:23 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of markrmiller@gmail.com designates 64.233.162.197 as permitted sender) Received: from [64.233.162.197] (HELO nz-out-0102.google.com) (64.233.162.197) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Sep 2006 04:59:22 -0700 Received: by nz-out-0102.google.com with SMTP id z6so584605nzd for ; Fri, 01 Sep 2006 04:59:02 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=ERKylsDyrnjrDDtvYDOg4Ii+WUeOBJe9XcddPm80Pg3yh2ZK4DSzL7eKVhgOgtpAlkAoYSQqPNwvZx31o8/EExGNWT71H3oqEPUSvfS7FyYd7N7yZB4uh4knFYtPMVWzTFCOMq+QBNzzNE0vIICqSo1E6d7vlQ1verGItJEqDNk= Received: by 10.65.234.3 with SMTP id l3mr2648376qbr; Fri, 01 Sep 2006 04:59:02 -0700 (PDT) Received: from ?192.168.1.102? ( [216.66.115.97]) by mx.gmail.com with ESMTP id a5sm1593086qbd.2006.09.01.04.59.00; Fri, 01 Sep 2006 04:59:01 -0700 (PDT) Message-ID: <44F82084.3000000@gmail.com> Date: Fri, 01 Sep 2006 07:59:00 -0400 From: Mark Miller User-Agent: Thunderbird 1.5.0.5 (Windows/20060719) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: SpanRegex speed References: <6016090.post@talk.nabble.com> <6035831.post@talk.nabble.com> <6057151.post@talk.nabble.com> <44F60D80.1070802@gmail.com> <9A767EEF-1DF0-41A6-8991-28516D912ADF@ehatchersolutions.com> <44F64698.3070800@gmail.com> <359a92830608310627r5a463c04kbbe6fcd7841465ff@mail.gmail.com> In-Reply-To: <359a92830608310627r5a463c04kbbe6fcd7841465ff@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Erick Erickson wrote: > Let me chime in here on a different note.... before you get happy with > wildcard queries, take a look at the thread "I just don't get > wildcards at > all". There is lots of good info that Erik, Chris and Otis provided me. > > The danger with prefixquery and wildcard query is that they will throw > TooManyClauses exceptions when you start matching a number of terms (the > default is 1024, although you can make this much bigger if memory > allows). > If you're aware of this and it is and will be OK in your app, ignore > this. > But if your index is going to grow significantly, this is a real > problem. I > went with implementing filters with WildCardTermEnum (you could also use > RegexTermEnum) for the wildcard portions of my query. Which has > interesting > implications for spans, we elected to say spans didn't work with > wildcards. > > Anyway, as I said, if you're aware of the TooManyClauses issue and are > sure > it doesn't matter, ignore me. After all, everybody else does ..... > > > Best > Erick > > > > On 8/30/06, Mark Miller wrote: >> >> Ignore that last question. I see that you said prefix wildcard query and >> not wildcard query. A quick look at the code seems to show it grabbing a >> prefix as well. >> >> Do you think one would be any faster than the other? Should I used >> Wildcardqueries outside of spanqueries and the regexquery inside >> spanqueries or use regex both places? >> >> - Mark >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > Thanks a lot for the info Eric. Good stuff to know for sure. I guess the real question I have been trying to spit out is this: Is a span version of any of these searches--fuzzy, wildcard, etc--inherently slower than their non-span brothers. If they have the same limitations and speeds then that is all I am looking for. P.S. I realize I have been screwing up the threading by replying when starting a new topic. I have been alerted and will stop this pernicious activity. - Mark --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org