Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 56851 invoked from network); 27 Aug 2009 14:25:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Aug 2009 14:25:43 -0000 Received: (qmail 4718 invoked by uid 500); 27 Aug 2009 14:25:42 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 4652 invoked by uid 500); 27 Aug 2009 14:25:42 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 4644 invoked by uid 99); 27 Aug 2009 14:25:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Aug 2009 14:25:42 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [212.82.104.142] (HELO web24716.mail.ird.yahoo.com) (212.82.104.142) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 27 Aug 2009 14:25:31 +0000 Received: (qmail 15351 invoked by uid 60001); 27 Aug 2009 14:25:10 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.uk; s=s1024; t=1251383110; bh=ELSie2xPN3+Od2T5YUKhiT19B7s4SUGW2adcxDDcSK8=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=AyH0m+MEQ1g4xDir5+iIIzt9zKRo3f7i3WSwwMmpWGWbSD3G20SgzH9pW4bb5YQ6gRBc2QHzTrz8LbZxH7U9+Lmnf3J/V3Hxq+FP2A3Bzp981ImNhECO/hmJKwJI5Jun192FmeLBzcfcI+vL+BixA528w4gAoxPJU6AwcmE8hO4= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.uk; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=G3NpSKGukzubmSumEn33GccE5YNL+MtgGumd9FfLg6Vuvx3Z0/IN2i5kDTvVFllqXIZjJR1d60+3vZxs6KlxlYmp3pmuyQ/15JSGyJ86OKKo0CAfwFmIqtq1eWswydgVOXS4irynaEm2KmimG4W6evZ6IA5DCUK15KKcwCLr3gc=; Message-ID: <457524.14998.qm@web24716.mail.ird.yahoo.com> X-YMail-OSG: ZMXUlDAVM1kaMjaP9UV1_17ufigSlTzXtdbt9H6rLdfjRfHR5D_.mimBYTpsvnOtqmSSab3qKWPQH3QBGvdqzegiIjSGcBwqBGE3QBOcTQRql1BKPKNJJ1b24crQ4SSG2i2tIu2pzs7K9PbCVEQVDYFFGCfZXB8gjnt5MB2QJmJoDlx1y3I3m2.vT0ieh_oXnTtOlQaJVFnCgfQGxhUCwGXJIP2Aj_G7sFVkvDA_Ncnc.s5llVIa9ID6_9Zwoqjfprh_aPDmM_F_BNRQQAnALKcAW8IwnjDv8WSyEb3nFR51L3MLAHiw6S6yOocPccYFEJHHMci2UvCAqZ.mSXEl9m.5L5x86cDq98w8FvepPsOxP9DcRNMAItVJW62YZ5eco9f4iLxujA-- Received: from [87.248.121.241] by web24716.mail.ird.yahoo.com via HTTP; Thu, 27 Aug 2009 14:25:10 GMT X-Mailer: YahooMailWebService/0.7.338.2 Date: Thu, 27 Aug 2009 14:25:10 +0000 (GMT) From: Mark Harwood Subject: Re: FuzzyLikeThis query and exact matches To: "java-dev@lucene.apache.org" MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I think those boosts shown are reflecting the edit distance. What we can't = see from this is that the Similarity class used in execution is using the s= ame IDF for all terms. The other factors at play will be the term frequency= in the doc, its length and any doc boost.=0AI don't have access to the cod= e right now but that is how I remember it working. There may be an option t= o turn term frequency off too. =0A=0A=0A=0AOn 27 Aug 2009, at 14:25, Berk= es Adam wrote:=0A=0AAfter searching for term "des= y" which has lot of variants in our index a rewritten (sub)query will look = like this:=0A=0A(text:dey^0.22828968 text:des^0.22828968 text:dest^1.155718= 4 text:desk^1.1557184 text:desi^1.1557184 text:desf^1.1557184 text:desc^1.1= 557184 text:deny^1.1557184 text:defy^1.1557184 text:desy^8.218443)=0A=0Abut= what I would like to achive to have all exact matches (even if rankings "v= alidly" send it to the end of matches) on top (or highest possible) while l= et variants to follow them according to their relevancy.=0A=0AMaybe I under= stand wrongly but the edit distance is not a factor in that query type: ind= ex is search for terms with edit distance within a certain limit, eliminate= IDF (with the factors above) and then create a coordinationless boolean qu= ery. I might play around (post modify) scoring for exact match subterm but = I'm not sure that is a working solution.=0A=0ABest regards,=0AAdam=0ADespit= e making IDF a constant the edit distance should remain a factor in the ran= kings so I would have thought this would give you what you need.=0A=0ACan y= ou supply a more detailed example? Either print the rewritten query or use = the explain function=0A=0ACheers=0AMark=0A=0AOn 27 Aug 2009, at 13:22, Berk= es Adam wrote:=0A=0AHi,=0A=0AIn our java project we uses a (slightly modife= d) version of FuzzyLikeThis query which=0A=0A"For each source term the fuzz= y variants are held in a BooleanQuery with no coord factor (because=0Awe ar= e not looking for matches on multiple variants in any one doc). Additionall= y, a specialized=0ATermQuery is used for variants and does not use that var= iant term's IDF because this would favour rarer=0Aterms eg misspellings. In= stead, all variants use the same IDF ranking (the one for the source query= =0Aterm) and this is factored into the variant's boost. If the source query= term does not exist in the=0Aindex the average IDF of the variants is used= ."=0A=0AIn most cases it performs well but if there is short query term wit= h (as usual) big number of variants the exact matches will be stay spreaded= among the others which is not so useful: it should be "sorted" like (or fo= rcibly set more relevant) exact matches and variant matches according to re= levancy.=0AIs there any simple solution or already implemented contrib quer= y class for this problem?=0A=0ABest regards,=0AAdam Berkes,=0AIntland Softw= are=0A=0A------------------------------------------------------------------= ---=0ATo unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org=0AFor = additional commands, e-mail: java-dev-help@lucene.apache.org=0A=0A=0A=0A---= ------------------------------------------------------------------=0ATo uns= ubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org=0AFor additional c= ommands, e-mail: java-dev-help@lucene.apache.org=0A=0A=0A=0A---------------= ------------------------------------------------------=0ATo unsubscribe, e-= mail: java-dev-unsubscribe@lucene.apache.org=0AFor additional commands, e-m= ail: java-dev-help@lucene.apache.org=0A=0A=0A=0A=0A --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org