Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 11605 invoked from network); 25 Oct 2005 16:57:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 25 Oct 2005 16:57:11 -0000 Received: (qmail 65781 invoked by uid 500); 25 Oct 2005 16:56:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 65641 invoked by uid 500); 25 Oct 2005 16:56:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 65630 invoked by uid 99); 25 Oct 2005 16:56:55 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Oct 2005 09:56:55 -0700 X-ASF-Spam-Status: No, hits=1.4 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: 217.207.206.146 is neither permitted nor denied by domain of bubblenut@gmail.com) Received: from [217.207.206.146] (HELO armel.cd-wow.com) (217.207.206.146) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Oct 2005 09:56:53 -0700 Message-ID: <435E63D0.6040505@gmail.com> Date: Tue, 25 Oct 2005 17:56:48 +0100 From: Rob Young User-Agent: Mozilla Thunderbird 1.0.7 (X11/20051011) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Funny results with Fuzzy References: <20051025161350.45387.qmail@web26006.mail.ukl.yahoo.com> <435E5C7A.9080403@gmail.com> In-Reply-To: <435E5C7A.9080403@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Rob Young wrote: > mark harwood wrote: > >> I'd be more inclined to guess that kylie->klyie falls >> below the 0.5f similarity threshold you pass. >> >> Try print out the results of >> fuzzyQuery.rewrite(indexReader).toString(); >> >> This will rewrite the fuzzyQuery to a BooleanQuery >> which explicitly lists the TermQuery objects that the >> fuzzyQuery has found potential matches for in your >> index. >> >> > Hey, thanks for the fuzzyQuery.rewrite tip, I'll try that out to see > what's going on. Regarding the theory about falling below the 0.5f > threshold, that's not the case because new FuzzyQuery( new Term( ... > ), 0.5f ) on it's own matches. I'll see what I can find out with your > rewrite tip though :) Ahahahaha!! Thank you, you were right after all. I didn't realize that once you set the fuzzy prefix length the threshold only applies to the _remainder_ of the string, which, of course, means that a search string whose first letter matches by default has a lower similarity after the fuzzy prefix length is applied. I must say, this isn't explained particularly well in the docs (not that I've explained it much better above). Well, thanks all. My fuzzy results are still a little funny but at least I have the prefix headache sorted. One thing I was thinking of doing was checking the character frequency and scoring on that somehow as well. IE klyie has one k, one l, one y etc. as does kylie but katie (another one which matches on levenstein alone) doesn't so klyie would rank higher. Has this been done before? Would it be possible? If so where abouts should I look in "Lucene in Action" or on the net? Many thanks Rob --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org